Multilocus sequence typing of total-genome-sequenced bacteria.
Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole
2012-04-01
Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.
Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.
Ghanem, Mostafa; El-Gazzar, Mohamed
2018-05-01
Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Mentasti, Massimo; Tewolde, Rediat; Aslett, Martin; Harris, Simon R.; Afshar, Baharak; Underwood, Anthony; Harrison, Timothy G.
2016-01-01
Sequence-based typing (SBT), analogous to multilocus sequence typing (MLST), is the current “gold standard” typing method for investigation of legionellosis outbreaks caused by Legionella pneumophila. However, as common sequence types (STs) cause many infections, some investigations remain unresolved. In this study, various whole-genome sequencing (WGS)-based methods were evaluated according to published guidelines, including (i) a single nucleotide polymorphism (SNP)-based method, (ii) extended MLST using different numbers of genes, (iii) determination of gene presence or absence, and (iv) a kmer-based method. L. pneumophila serogroup 1 isolates (n = 106) from the standard “typing panel,” previously used by the European Society for Clinical Microbiology Study Group on Legionella Infections (ESGLI), were tested together with another 229 isolates. Over 98% of isolates were considered typeable using the SNP- and kmer-based methods. Percentages of isolates with complete extended MLST profiles ranged from 99.1% (50 genes) to 86.8% (1,455 genes), while only 41.5% produced a full profile with the gene presence/absence scheme. Replicates demonstrated that all methods offer 100% reproducibility. Indices of discrimination range from 0.972 (ribosomal MLST) to 0.999 (SNP based), and all values were higher than that achieved with SBT (0.940). Epidemiological concordance is generally inversely related to discriminatory power. We propose that an extended MLST scheme with ∼50 genes provides optimal epidemiological concordance while substantially improving the discrimination offered by SBT and can be used as part of a hierarchical typing scheme that should maintain backwards compatibility and increase discrimination where necessary. This analysis will be useful for the ESGLI to design a scheme that has the potential to become the new gold standard typing method for L. pneumophila. PMID:27280420
David, Sophia; Mentasti, Massimo; Tewolde, Rediat; Aslett, Martin; Harris, Simon R; Afshar, Baharak; Underwood, Anthony; Fry, Norman K; Parkhill, Julian; Harrison, Timothy G
2016-08-01
Sequence-based typing (SBT), analogous to multilocus sequence typing (MLST), is the current "gold standard" typing method for investigation of legionellosis outbreaks caused by Legionella pneumophila However, as common sequence types (STs) cause many infections, some investigations remain unresolved. In this study, various whole-genome sequencing (WGS)-based methods were evaluated according to published guidelines, including (i) a single nucleotide polymorphism (SNP)-based method, (ii) extended MLST using different numbers of genes, (iii) determination of gene presence or absence, and (iv) a kmer-based method. L. pneumophila serogroup 1 isolates (n = 106) from the standard "typing panel," previously used by the European Society for Clinical Microbiology Study Group on Legionella Infections (ESGLI), were tested together with another 229 isolates. Over 98% of isolates were considered typeable using the SNP- and kmer-based methods. Percentages of isolates with complete extended MLST profiles ranged from 99.1% (50 genes) to 86.8% (1,455 genes), while only 41.5% produced a full profile with the gene presence/absence scheme. Replicates demonstrated that all methods offer 100% reproducibility. Indices of discrimination range from 0.972 (ribosomal MLST) to 0.999 (SNP based), and all values were higher than that achieved with SBT (0.940). Epidemiological concordance is generally inversely related to discriminatory power. We propose that an extended MLST scheme with ∼50 genes provides optimal epidemiological concordance while substantially improving the discrimination offered by SBT and can be used as part of a hierarchical typing scheme that should maintain backwards compatibility and increase discrimination where necessary. This analysis will be useful for the ESGLI to design a scheme that has the potential to become the new gold standard typing method for L. pneumophila. Copyright © 2016 David et al.
Unemo, Magnus; Dillon, Jo-Anne R.
2011-01-01
Summary: Gonorrhea, which may become untreatable due to multiple resistance to available antibiotics, remains a public health problem worldwide. Precise methods for typing Neisseria gonorrhoeae, together with epidemiological information, are crucial for an enhanced understanding regarding issues involving epidemiology, test of cure and contact tracing, identifying core groups and risk behaviors, and recommending effective antimicrobial treatment, control, and preventive measures. This review evaluates methods for typing N. gonorrhoeae isolates and recommends various methods for different situations. Phenotypic typing methods, as well as some now-outdated DNA-based methods, have limited usefulness in differentiating between strains of N. gonorrhoeae. Genotypic methods based on DNA sequencing are preferred, and the selection of the appropriate genotypic method should be guided by its performance characteristics and whether short-term epidemiology (microepidemiology) or long-term and/or global epidemiology (macroepidemiology) matters are being investigated. Currently, for microepidemiological questions, the best methods for fast, objective, portable, highly discriminatory, reproducible, typeable, and high-throughput characterization are N. gonorrhoeae multiantigen sequence typing (NG-MAST) or full- or extended-length porB gene sequencing. However, pulsed-field gel electrophoresis (PFGE) and Opa typing can be valuable in specific situations, i.e., extreme microepidemiology, despite their limitations. For macroepidemiological studies and phylogenetic studies, DNA sequencing of chromosomal housekeeping genes, such as multilocus sequence typing (MLST), provides a more nuanced understanding. PMID:21734242
Molecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods
2016-01-01
Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory conditions and the specific features of the geographic region is important. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis is considered the gold standard for the molecular epidemiologic investigations of tuberculosis. However, other polymerase chain reaction-based methods such as spacer oligonucleotide typing (spoligotyping), which detects 43 spacer sequence-interspersing direct repeats (DRs) in the genomic DR region; mycobacterial interspersed repetitive units–variable number tandem repeats, (MIRU-VNTR), which determines the number and size of tandem repetitive DNA sequences; repetitive-sequence-based PCR (rep-PCR), which provides high-throughput genotypic fingerprinting of multiple Mycobacterium species; and the recently developed genome-based whole genome sequencing methods demonstrate similar discriminatory power and greater convenience. This review focuses on techniques frequently used for the molecular typing of M. tuberculosis and discusses their general aspects and applications. PMID:27709842
Molecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods.
Ei, Phyu Win; Aung, Wah Wah; Lee, Jong Seok; Choi, Go Eun; Chang, Chulhun L
2016-11-01
Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory conditions and the specific features of the geographic region is important. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis is considered the gold standard for the molecular epidemiologic investigations of tuberculosis. However, other polymerase chain reaction-based methods such as spacer oligonucleotide typing (spoligotyping), which detects 43 spacer sequence-interspersing direct repeats (DRs) in the genomic DR region; mycobacterial interspersed repetitive units-variable number tandem repeats, (MIRU-VNTR), which determines the number and size of tandem repetitive DNA sequences; repetitive-sequence-based PCR (rep-PCR), which provides high-throughput genotypic fingerprinting of multiple Mycobacterium species; and the recently developed genome-based whole genome sequencing methods demonstrate similar discriminatory power and greater convenience. This review focuses on techniques frequently used for the molecular typing of M. tuberculosis and discusses their general aspects and applications.
An automated genotyping tool for enteroviruses and noroviruses.
Kroneman, A; Vennema, H; Deforche, K; v d Avoort, H; Peñaranda, S; Oberste, M S; Vinjé, J; Koopmans, M
2011-06-01
Molecular techniques are established as routine in virological laboratories and virus typing through (partial) sequence analysis is increasingly common. Quality assurance for the use of typing data requires harmonization of genotype nomenclature, and agreement on target genes, depending on the level of resolution required, and robustness of methods. To develop and validate web-based open-access typing-tools for enteroviruses and noroviruses. An automated web-based typing algorithm was developed, starting with BLAST analysis of the query sequence against a reference set of sequences from viruses in the family Picornaviridae or Caliciviridae. The second step is phylogenetic analysis of the query sequence and a sub-set of the reference sequences, to assign the enterovirus type or norovirus genotype and/or variant, with profile alignment, construction of phylogenetic trees and bootstrap validation. Typing is performed on VP1 sequences of Human enterovirus A to D, and ORF1 and ORF2 sequences of genogroup I and II noroviruses. For validation, we used the tools to automatically type sequences in the RIVM and CDC enterovirus databases and the FBVE norovirus database. Using the typing-tools, 785(99%) of 795 Enterovirus VP1 sequences, and 8154(98.5%) of 8342 norovirus sequences were typed in accordance with previously used methods. Subtyping into variants was achieved for 4439(78.4%) of 5838 NoV GII.4 sequences. The online typing-tools reliably assign genotypes for enteroviruses and noroviruses. The use of phylogenetic methods makes these tools robust to ongoing evolution. This should facilitate standardized genotyping and nomenclature in clinical and public health laboratories, thus supporting inter-laboratory comparisons. Copyright © 2011 Elsevier B.V. All rights reserved.
Fasihi, Yasser; Fooladi, Saba; Mohammadi, Mohammad Ali; Emaneini, Mohammad; Kalantar-Neyestanaki, Davood
2017-09-06
Molecular typing is an important tool for control and prevention of infection. A suitable molecular typing method for epidemiological investigation must be easy to perform, highly reproducible, inexpensive, rapid and easy to interpret. In this study, two molecular typing methods including the conventional PCR-sequencing method and high resolution melting (HRM) analysis were used for staphylococcal protein A (spa) typing of 30 Methicillin-resistant Staphylococcus aureus (MRSA) isolates recovered from clinical samples. Based on PCR-sequencing method results, 16 different spa types were identified among the 30 MRSA isolates. Among the 16 different spa types, 14 spa types separated by HRM method. Two spa types including t4718 and t2894 were not separated from each other. According to our results, spa typing based on HRM analysis method is very rapid, easy to perform and cost-effective, but this method must be standardized for different regions, spa types, and real-time machinery.
PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods
2012-01-01
Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net. PMID:22568821
Multilocus sequence typing reveals a novel subspeciation of Lactobacillus delbrueckii.
Tanigawa, Kana; Watanabe, Koichi
2011-03-01
Currently, the species Lactobacillus delbrueckii is divided into four subspecies, L. delbrueckii subsp. delbrueckii, L. delbrueckii subsp. bulgaricus, L. delbrueckii subsp. indicus and L. delbrueckii subsp. lactis. These classifications were based mainly on phenotypic identification methods and few studies have used genotypic identification methods. As a result, these subspecies have not yet been reliably delineated. In this study, the four subspecies of L. delbrueckii were discriminated by phenotype and by genotypic identification [amplified-fragment length polymorphism (AFLP) and multilocus sequence typing (MLST)] methods. The MLST method developed here was based on the analysis of seven housekeeping genes (fusA, gyrB, hsp60, ileS, pyrG, recA and recG). The MLST method had good discriminatory ability: the 41 strains of L. delbrueckii examined were divided into 34 sequence types, with 29 sequence types represented by only a single strain. The sequence types were divided into eight groups. These groups could be discriminated as representing different subspecies. The results of the AFLP and MLST analyses were consistent. The type strain of L. delbrueckii subsp. delbrueckii, YIT 0080(T), was clearly discriminated from the other strains currently classified as members of this subspecies, which were located close to strains of L. delbrueckii subsp. lactis. The MLST scheme developed in this study should be a useful tool for the identification of strains of L. delbrueckii to the subspecies level.
Liao, Feng; Mo, Zhishuo; Chen, Meiling; Pang, Bo; Fu, Xiaoqing; Xu, Wen; Jing, Huaiqi; Kan, Biao; Gu, Wenpeng
2018-01-01
Vibrio cholerae O1 strains taken from the repository of Yunnan province, southwest China, were abundant and special. We selected 70 typical toxigenic V. cholerae (69 O1 and one O139 serogroup strains) isolated from Yunnan province, performed the pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), and MLST of virulence gene (V-MLST) methods, and evaluated the resolution abilities for typing methods. The ctxB subunit sequence analysis for all strains have shown that cholera between 1986 and 1995 was associated with mixed infections with El Tor and El Tor variants, while infections after 1996 were all caused by El Tor variant strains. Seventy V. cholerae obtained 50 PFGE patterns, with a high resolution. The strains could be divided into three groups with predominance of strains isolated during 1980s, 1990s, and 2000s, respectively, showing a good consistency with the epidemiological investigation. We also evaluated two MLST method for V. cholerae , one was used seven housekeeping genes ( adk , gyrB , metE , pntA , mdh , purM , and pyrC ), and all the isolates belonged to ST69; another was used nine housekeeping genes ( cat , chi , dnaE , gyrB , lap , pgm , recA , rstA , and gmd ). A total of seven sequence types (STs) were found by using this method for all the strains; among them, rstA gene had five alleles, recA and gmd have two alleles, and others had only one allele. The virulence gene sequence typing method ( ctxAB , tcpA , and toxR ) showed that 70 strains were divided into nine STs; among them, tcpA gene had six alleles, toxR had five alleles, while ctxAB was identical for all the strains. The latter two sequences based typing methods also had consistency with epidemiology of the strains. PFGE had a higher resolution ability compared with the sequence based typing method, and MLST used seven housekeeping genes showed the lower resolution power than nine housekeeping genes and virulence genes methods. These two sequence typing methods could distinguish some epidemiological special strains in local area.
SeqRate: sequence-based protein folding type classification and rates prediction
2010-01-01
Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647
Development of an ELA-DRA gene typing method based on pyrosequencing technology.
Díaz, S; Echeverría, M G; It, V; Posik, D M; Rogberg-Muñoz, A; Pena, N L; Peral-García, P; Vega-Pla, J L; Giovambattista, G
2008-11-01
The polymorphism of equine lymphocyte antigen (ELA) class II DRA gene had been detected by polymerase chain reaction-single-strand conformational polymorphism (PCR-SSCP) and reference strand-mediated conformation analysis. These methodologies allowed to identify 11 ELA-DRA exon 2 sequences, three of which are widely distributed among domestic horse breeds. Herein, we describe the development of a pyrosequencing-based method applicable to ELA-DRA typing, by screening samples from eight different horse breeds previously typed by PCR-SSCP. This sequence-based method would be useful in high-throughput genotyping of major histocompatibility complex genes in horses and other animal species, making this system interesting as a rapid screening method for animal genotyping of immune-related genes.
Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A
2012-05-01
The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.
Fukunaga, Kenji; Ichitani, Katsuyuki; Taura, Satoru; Sato, Muneharu; Kawase, Makoto
2005-02-01
We determined the sequence of ribosomal DNA (rDNA) intergenic spacer (IGS) of foxtail millet isolated in our previous study, and identified subrepeats in the polymorphic region. We also developed a PCR-based method for identifying rDNA types based on sequence information and assessed 153 accessions of foxtail millet. Results were congruent with our previous works. This study provides new findings regarding the geographical distribution of rDNA variants. This new method facilitates analyses of numerous foxtail millet accessions. It is helpful for typing of foxtail millet germplasms and elucidating the evolution of this millet.
spa typing for epidemiological surveillance of Staphylococcus aureus.
Hallin, Marie; Friedrich, Alexander W; Struelens, Marc J
2009-01-01
The spa typing method is based on sequencing of the polymorphic X region of the protein A gene (spa), present in all strains of Staphylococcus aureus. The X region is constituted of a variable number of 24-bp repeats flanked by well-conserved regions. This single-locus sequence-based typing method combines a number of technical advantages, such as rapidity, reproducibility, and portability. Moreover, due to its repeat structure, the spa locus simultaneously indexes micro- and macrovariations, enabling the use of spa typing in both local and global epidemiological studies. These studies are facilitated by the establishment of standardized spa type nomenclature and Internet shared databases.
Urabe, N; Ishii, Y; Hyodo, Y; Aoki, K; Yoshizawa, S; Saga, T; Murayama, S Y; Sakai, K; Homma, S; Tateda, K
2016-04-01
Between 18 November and 3 December 2011, five renal transplant patients at the Department of Nephrology, Toho University Omori Medical Centre, Tokyo, were diagnosed with Pneumocystis pneumonia (PCP). We used molecular epidemiologic methods to determine whether the patients were infected with the same strain of Pneumocystis jirovecii. DNA extracted from the residual bronchoalveolar lavage fluid from the five outbreak cases and from another 20 cases of PCP between 2007 and 2014 were used for multilocus sequence typing to compare the genetic similarity of the P. jirovecii. DNA base sequencing by the Sanger method showed some regions where two bases overlapped and could not be defined. A next-generation sequencer was used to analyse the types and ratios of these overlapping bases. DNA base sequences of P. jirovecii in the bronchoalveolar lavage fluid from four of the five PCP patients in the 2011 outbreak and from another two renal transplant patients who developed PCP in 2013 were highly homologous. The Sanger method revealed 14 genomic regions where two differing DNA bases overlapped and could not be identified. Analyses of the overlapping bases by a next-generation sequencer revealed that the differing types of base were present in almost identical ratios. There is a strong possibility that the PCP outbreak at the Toho University Omori Medical Centre was caused by the same strain of P. jirovecii. Two different types of base present in some regions may be due to P. jirovecii's being a diploid species. Copyright © 2015 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Lager, Malin; Mernelius, Sara; Löfgren, Sture; Söderman, Jan
2016-01-01
Healthcare-associated infections caused by Escherichia coli and antibiotic resistance due to extended-spectrum beta-lactamase (ESBL) production constitute a threat against patient safety. To identify, track, and control outbreaks and to detect emerging virulent clones, typing tools of sufficient discriminatory power that generate reproducible and unambiguous data are needed. A probe based real-time PCR method targeting multiple single nucleotide polymorphisms (SNP) was developed. The method was based on the multi locus sequence typing scheme of Institute Pasteur and by adaptation of previously described typing assays. An 8 SNP-panel that reached a Simpson's diversity index of 0.95 was established, based on analysis of sporadic E. coli cases (ESBL n = 27 and non-ESBL n = 53). This multi-SNP assay was used to identify the sequence type 131 (ST131) complex according to the Achtman's multi locus sequence typing scheme. However, it did not fully discriminate within the complex but provided a diagnostic signature that outperformed a previously described detection assay. Pulsed-field gel electrophoresis typing of isolates from a presumed outbreak (n = 22) identified two outbreaks (ST127 and ST131) and three different non-outbreak-related isolates. Multi-SNP typing generated congruent data except for one non-outbreak-related ST131 isolate. We consider multi-SNP real-time PCR typing an accessible primary generic E. coli typing tool for rapid and uniform type identification.
Chen, M J; Chu, C C; Shyr, M H; Lin, C L; Lin, P Y; Yang, K L
2010-02-01
HLA-B*5214, a novel rare allele of HLA-B*52 variant, was found in a Taiwanese volunteer bone marrow donor by sequence-based typing method. The sequence of B*5214 is identical to that of B*520101 in exon 2 but differs from B*520101 in exon 3 at nucleotide positions 419 A-->T and 435 A-->G. Alteration of these two nucleotides resulted an amino acid substitution at amino acid residue 116 Y-->F ( TAC-->TTC) and a silent exchange at residue 121 K-->K (AAA-->AAG).
Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.
Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C
2018-06-01
There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.
Daniel, Hubert D-J; David, Joel; Raghuraman, Sukanya; Gnanamony, Manu; Chandy, George M; Sridharan, Gopalan; Abraham, Priya
2017-05-01
Based on genetic heterogeneity, hepatitis C virus (HCV) is classified into seven major genotypes and 64 subtypes. In spite of the sequence heterogeneity, all genotypes share an identical complement of colinear genes within the large open reading frame. The genetic interrelationships between these genes are consistent among genotypes. Due to this property, complete sequencing of the HCV genome is not required. HCV genotypes along with subtypes are critical for planning antiviral therapy. Certain genotypes are also associated with higher progression to liver cirrhosis. In this study, 100 blood samples were collected from individuals who came for routine HCV genotype identification. These samples were used for the comparison of two different genotyping methods (5'NCR PCR-RFLP and HCV core type-specific PCR) with NS5b sequencing. Of the 100 samples genotyped using 5'NCR PCR-RFLP and HCV core type-specific PCR, 90% (κ = 0.913, P < 0.00) and 96% (κ = 0.794, P < 0.00) correlated with NS5b sequencing, respectively. Sixty percent and 75% of discordant samples by 5'NCR PCR-RFLP and HCV core type-specific PCR, respectively, belonged to genotype 6. All the HCV genotype 1 subtypes were classified accurately by both the methods. This study shows that the 5'NCR-based PCR-RFLP and the HCV core type-specific PCR-based assays correctly identified HCV genotypes except genotype 6 from this region. Direct sequencing of the HCV core region was able to identify all the genotype 6 from this region and serves as an alternative to NS5b sequencing. © 2016 Wiley Periodicals, Inc.
The multilocus sequence typing network: mlst.net.
Aanensen, David M; Spratt, Brian G
2005-07-01
The unambiguous characterization of strains of a pathogen is crucial for addressing questions relating to its epidemiology, population and evolutionary biology. Multilocus sequence typing (MLST), which defines strains from the sequences at seven house-keeping loci, has become the method of choice for molecular typing of many bacterial and fungal pathogens (and non-pathogens), and MLST schemes and strain databases are available for a growing number of prokaryotic and eukaryotic organisms. Sequence data are ideal for strain characterization as they are unambiguous, meaning strains can readily be compared between laboratories via the Internet. Laboratories undertaking MLST can quickly progress from sequencing the seven gene fragments to characterizing their strains and relating them to those submitted by others and to the population as a whole. We provide the gateway to a number of MLST schemes, each of which contain a set of tools for the initial characterization of strains, and methods for relating query strains to other strains of the species, including clustering based on differences in allelic profiles, phylogenetic trees based on concatenated sequences, and a recently developed method (eBURST) for identifying clonal complexes within a species and displaying the overall structure of the population. This network of MLST websites is available at http://www.mlst.net.
O'Hara, F. Patrick; Suaya, Jose A.; Ray, G. Thomas; Baxter, Roger; Brown, Megan L.; Mera, Robertino M.; Close, Nicole M.; Thomas, Elizabeth
2016-01-01
A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants. PMID:26669861
O'Hara, F Patrick; Suaya, Jose A; Ray, G Thomas; Baxter, Roger; Brown, Megan L; Mera, Robertino M; Close, Nicole M; Thomas, Elizabeth; Amrine-Madsen, Heather
2016-01-01
A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants.
Moser, Aline; Wüthrich, Daniel; Bruggmann, Rémy; Eugster-Meier, Elisabeth; Meile, Leo; Irmler, Stefan
2017-01-01
The advent of massive parallel sequencing technologies has opened up possibilities for the study of the bacterial diversity of ecosystems without the need for enrichment or single strain isolation. By exploiting 78 genome data-sets from Lactobacillus helveticus strains, we found that the slpH locus that encodes a putative surface layer protein displays sufficient genetic heterogeneity to be a suitable target for strain typing. Based on high-throughput slpH gene sequencing and the detection of single-base DNA sequence variations, we established a culture-independent method to assess the biodiversity of the L. helveticus strains present in fermented dairy food. When we applied the method to study the L. helveticus strain composition in 15 natural whey cultures (NWCs) that were collected at different Gruyère, a protected designation of origin (PDO) production facilities, we detected a total of 10 sequence types (STs). In addition, we monitored the development of a three-strain mix in raclette cheese for 17 weeks. PMID:28775722
Healy, B; Mullane, N; Collin, V; Mailler, S; Iversen, C; Chatellier, S; Storrs, M; Fanning, S
2008-07-01
Enterobacter sakazakii is regarded as a ubiquitous organism that can be isolated from a wide range of foods and environments. Infection in at-risk infants has been epidemiologically linked to the consumption of contaminated powdered infant formula. Preventing the dissemination of this pathogen in a powdered infant formula manufacturing facility is an important step in ensuring consumer confidence in a given brand together with the protection of the health status of a vulnerable population. In this study we report the application of a repetitive sequence-based PCR typing method to subtype a previously well-characterized collection of E. sakazakii isolates of diverse origin. While both methods successfully discriminated between the collection of isolates, repetitive sequence-based PCR identified 65 types, whereas pulsed-field gel electrophoresis identified 110 types showing > or =95% similarity. The method was quick and easy to perform, and our data demonstrated the utility and value of this approach to monitor in-process contamination, which could potentially contribute to a reduction in the transmission of E. sakazakii.
Herrmann, Luise; Haase, Ilka; Blauhut, Maike; Barz, Nadine; Fischer, Markus
2014-12-17
Two cocoa types, Arriba and CCN-51, are being cultivated in Ecuador. With regard to the unique aroma, Arriba is considered a fine cocoa type, while CCN-51 is a bulk cocoa because of its weaker aroma. Because it is being assumed that Arriba is mixed with CCN-51, there is an interest in the analytical differentiation of the two types. Two methods to identify CCN-51 adulterations in Arriba cocoa were developed on the basis of differences in the chloroplast DNA. On the one hand, a different repeat of the sequence TAAAG in the inverted repeat region results in a different length of amplicons for the two cocoa types, which can be detected by agarose gel electrophoresis, capillary gel electrophoresis, and denaturing high-performance liquid chromatography. On the other hand, single nucleotide polymorphisms (SNPs) between the CCN-51 and Arriba sequences represent restriction sites, which can be used for restriction fragment length polymorphism analysis. A semi-quantitative analysis based on these SNPs is feasible. A method for an exact quantitation based on these results is not realizable. These sequence variations were confirmed for a comprehensive cultivar collection of Arriba and CCN-51, for both bean and leaf samples.
Vincent, Caroline; Usongo, Valentine; Berry, Chrystal; Tremblay, Denise M; Moineau, Sylvain; Yousfi, Khadidja; Doualla-Bell, Florence; Fournier, Eric; Nadon, Céline; Goodridge, Lawrence; Bekal, Sadjia
2018-08-01
Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. This serovar ranks second and third among serovars that cause human infections in Québec and Canada, respectively, and has been associated with severe infections. Traditional typing methods such as PFGE do not display adequate discrimination required to resolve outbreak investigations due to the low level of genetic diversity of isolates belonging to this serovar. This study evaluates the ability of four whole genome sequence (WGS)-based typing methods to differentiate among 145 S. Heidelberg strains involved in four distinct outbreak events and sporadic cases of salmonellosis that occurred in Québec between 2007 and 2016. Isolates from all outbreaks were indistinguishable by PFGE. The core genome single nucleotide variant (SNV), core genome multilocus sequence typing (MLST) and whole genome MLST approaches were highly discriminatory and separated outbreak strains into four distinct phylogenetic clusters that were concordant with the epidemiological data. The clustered regularly interspaced short palindromic repeats (CRISPR) typing method was less discriminatory. However, CRISPR typing may be used as a secondary method to differentiate isolates of S. Heidelberg that are genetically similar but epidemiologically unrelated to outbreak events. WGS-based typing methods provide a highly discriminatory alternative to PFGE for the laboratory investigation of foodborne outbreaks. Copyright © 2018 Elsevier Ltd. All rights reserved.
Günthard, H F; Wong, J K; Ignacio, C C; Havlir, D V; Richman, D D
1998-07-01
The performance of the high-density oligonucleotide array methodology (GeneChip) in detecting drug resistance mutations in HIV-1 pol was compared with that of automated dideoxynucleotide sequencing (ABI) of clinical samples, viral stocks, and plasmid-derived NL4-3 clones. Sequences from 29 clinical samples (plasma RNA, n = 17; lymph node RNA, n = 5; lymph node DNA, n = 7) from 12 patients, from 6 viral stock RNA samples, and from 13 NL4-3 clones were generated by both methods. Editing was done independently by a different investigator for each method before comparing the sequences. In addition, NL4-3 wild type (WT) and mutants were mixed in varying concentrations and sequenced by both methods. Overall, a concordance of 99.1% was found for a total of 30,865 bases compared. The comparison of clinical samples (plasma RNA and lymph node RNA and DNA) showed a slightly lower match of base calls, 98.8% for 19,831 nucleotides compared (protease region, 99.5%, n = 8272; RT region, 98.3%, n = 11,316), than for viral stocks and NL4-3 clones (protease region, 99.8%; RT region, 99.5%). Artificial mixing experiments showed a bias toward calling wild-type bases by GeneChip. Discordant base calls are most likely due to differential detection of mixtures. The concordance between GeneChip and ABI was high and appeared dependent on the nature of the templates (directly amplified versus cloned) and the complexity of mixes.
Guo, Yahong; Tsuruga, Ayako; Yamaguchi, Shigeharu; Oba, Koji; Iwai, Kasumi; Sekita, Setsuko; Mizukami, Hajime
2006-06-01
Chloroplast chlB gene encoding subunit B of light-independent protochlorophyllide reductase was amplified from herbarium and crude drug specimens of Ephedra sinica, E. intermedia, E. equisetina, and E. przewalskii. Sequence comparison of the chlB gene indicated that all the E. sinica specimens have the same sequence type (Type S) distinctive from other species, while there are two sequence types (Type E1 and Type E2) in E. equisetina. E. intermedia and E. prezewalskii revealed an identical sequence type (Type IP). E. sinica was also identified by digesting the chlB fragment with Bcl I. A novel method for DNA authentication of Ephedra Herb based on the sequences of the chloroplast chlB gene and internal transcribed spacer of nuclear rRNA genes was developed and successfully applied for identification of the crude drugs obtained in the Chinese market.
Ochoa, David; García-Gutiérrez, Ponciano; Juan, David; Valencia, Alfonso; Pazos, Florencio
2013-01-27
A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.
V, Pavana Jyothi; S, Akila; Selvan, Malini K; Naidu, Hariprasad; Raghunathan, Shwethaa; Kota, Sathish; Sundaram, R C Raja; Rana, Samir Kumar; Raj, G Dhinakar; Srinivasan, V A; Mohana Subramanian, B
2016-12-01
Canine parvovirus (CPV) is a non-enveloped single stranded DNA virus with an icosahedral capsid. Mini-sequencing based CPV typing was developed earlier to detect and differentiate all the CPV types and FPV in a single reaction. This technique was further evaluated in the present study by performing the mini-sequencing directly from fecal samples which avoided tedious virus isolation steps by cell culture system. Fecal swab samples were collected from 84 dogs with enteritis symptoms, suggestive of parvoviral infection from different locations across India. Seventy six of these samples were positive by PCR; the subsequent mini-sequencing reaction typed 74 of them as type 2a virus, and 2 samples as type 2b. Additionally, 25 of the positive samples were typed by cycle sequencing of PCR products. Direct CPV typing from fecal samples using mini-sequencing showed 100% correlation with CPV typing by cycle sequencing. Moreover, CPV typing was achieved by mini-sequencing even with faintly positive PCR amplicons which was not possible by cycle sequencing. Therefore, the mini-sequencing technique is recommended for regular epidemiological follow up of CPV types, since the technique is rapid, highly sensitive and high capacity method for CPV typing. Copyright © 2016. Published by Elsevier B.V.
Just, Rebecca S; Irwin, Jodi A
2018-05-01
Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.
Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue
2018-05-02
Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
Faria, Nuno A.; Carrico, João A.; Oliveira, Duarte C.; Ramirez, Mário; de Lencastre, Hermínia
2008-01-01
Sequence-based methods for typing Staphylococcus aureus, such as multilocus sequence typing (MLST) and spa typing, have increased interlaboratory reproducibility, portability, and speed in obtaining results, but pulsed-field gel electrophoresis (PFGE), remains the method of choice in many laboratories due to the extensive experience with this methodology and the large body of data accumulated using the technique. Comparisons between typing methods have been overwhelmingly based on a qualitative assessment of the overall agreement of results and the relative discriminatory indexes. In this study, we quantitatively assess the congruence of the major typing methods for S. aureus, using a diverse collection of 198 S. aureus strains previously characterized by PFGE, spa typing, MLST, and, in the case of methicillin-resistant S. aureus (MRSA), SCCmec typing in order to establish the quantitative congruence between the typing methods. The results of most typing methods agree in that MRSA and methicillin-susceptible S. aureus (MSSA) differ in terms of diversity of genetic backgrounds, with MSSA being more diverse. Our results show that spa typing has a very good predictive power over the clonal relationships defined by eBURST, while PFGE is less accurate for that purpose but nevertheless provides better typeability and discriminatory power. The combination of PFGE and spa typing provided even better results. Based on these observations, we suggest the use of the conjugation of spa typing and PFGE typing for epidemiological surveillance studies, since this combination provides the ability to infer long-term relationships while maintaining the discriminatory power and typeability needed in short-term studies. PMID:17989188
Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Kusuya, Yoko; Takahashi, Hiroki; Yaguchi, Takashi
2017-04-26
Accurate identification of Aspergillus species is a very important subject. Mass spectral fingerprinting using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) is generally employed for the rapid identification of fungal isolates. However, the results are based on simple mass spectral pattern-matching, with no peak assignment and no taxonomic input. We propose here a ribosomal subunit protein (RSP) typing technique using MALDI-TOF MS for the identification and discrimination of Aspergillus species. The results are concluded to be phylogenetic in that they reflect the molecular evolution of housekeeping RSPs. The amino acid sequences of RSPs of genome-sequenced strains of Aspergillus species were first verified and compared to compile a reliable biomarker list for the identification of Aspergillus species. In this process, we revealed that many amino acid sequences of RSPs (about 10-60%, depending on strain) registered in the public protein databases needed to be corrected or newly added. The verified RSPs were allocated to RSP types based on their mass. Peak assignments of RSPs of each sample strain as observed by MALDI-TOF MS were then performed to set RSP type profiles, which were then further processed by means of cluster analysis. The resulting dendrogram based on RSP types showed a relatively good concordance with the tree based on β-tubulin gene sequences. RSP typing was able to further discriminate the strains belonging to Aspergillus section Fumigati. The RSP typing method could be applied to identify Aspergillus species, even for species within section Fumigati. The discrimination power of RSP typing appears to be comparable to conventional β-tubulin gene analysis. This method would therefore be suitable for species identification and discrimination at the strain to species level. Because RSP typing can characterize the strains within section Fumigati, this method has potential as a powerful and reliable tool in the field of clinical microbiology.
Oh, Yejin; Song, Ik-Chan; Kim, Jimyung; Kwon, Gye Cheol; Koo, Sun Hoe; Kim, Seon Young
2018-05-01
We developed a pyrosequencing-based method for the quantification of CALR mutations and compared the results using Sanger sequencing, fragment length analysis (FLA), digital-droplet PCR (ddPCR), and next-generation sequencing (NGS). Method validation studies were performed using cloned plasmid controls. Samples from 24 patients with myeloproliferative neoplasms were evaluated. Among the 24 patients, 15 had CALR mutations (7 type 1, 2 type 2, and 6 other mutations). The type 1 or type 2 mutation-positive results from pyrosequencing exhibited 100% concordance with the Sanger sequencing results. One novel CALR mutation was not detected by pyrosequencing. The CALR mutation allele burdens measured by pyrosequencing were slightly lower than those measured by FLA but slightly higher than the results obtained using ddPCR. Pyrosequencing exhibited high correlations with both methods. The mutation allele burdens estimated by NGS were significantly lower than those measured by pyrosequencing. An increased CALR mutation allele burden was associated with overt primary myelofibrosis. Patients with >70% mutation allele burdens in myeloid cells had a significantly longer time from diagnosis (P = 0.007), more bone marrow fibrosis (P = 0.010), and lower hemoglobin (P = 0.007). Pyrosequencing was a useful rapid sequencing method to determine the burden of CALR mutations. Copyright © 2018 Elsevier B.V. All rights reserved.
Method for identifying and quantifying nucleic acid sequence aberrations
Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.
1998-01-01
A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.
Method for identifying and quantifying nucleic acid sequence aberrations
Lucas, J.N.; Straume, T.; Bogen, K.T.
1998-07-21
A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.
Mikkelsen, Martin; Frank-Hansen, Rune; Hansen, Anders J; Morling, Niels
2014-09-01
of sequencing of whole mitochondrial genome, HV1 and HV2 DNA with the second generation system (SGS) Roche 454 GS Junior were compared with results of Sanger sequencing and SNP typing with SNaPshot single base extension detected with MALDI-TOF and capillary electrophoresis. We investigated the performance of the software analysis of the data, reproducibility, ability to sequence homopolymeric regions, detection of mixtures and heteroplasmy as well as the implications of the depth of coverage. We found full reproducibility between samples sequenced twice with SGS. We found close to full concordance between the mtDNA sequences of 26 samples obtained with (1) the 454 SGS method using a depth of coverage above 100 and (2) Sanger sequencing and SNP typing. The discrepancies were primarily observed in homopolymeric regions. The 454 SGS method was able to sequence 95% of the reads correctly in homopolymers up to 4 bases, and up to 6 bases could be sequenced with similar success if the results were carefully, visually inspected. The 454 technology was able to detect mixtures or heteroplasmy of approximately 10%. We detected previously unreported heteroplasmy in the GM9947A component of the NIST human mitochondrial DNA SRM-2392 standard reference material. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
A Multilocus Sequence Typing (MLST) method based on allelic variation of 7 chromosomal loci was developed for characterizing genotypes within the genus Bradyrhizobium. With the method 29 distinct multilocus genotypes (GTs) were identified among 191 culture collection soybean strains. The occupancy ...
How-Kit, Alexandre; Tost, Jörg
2015-01-01
A number of molecular diagnostic assays have been developed in the last years for mutation detection. Although these methods have become increasingly sensitive, most of them are incompatible with a sequencing-based readout and require prior knowledge of the mutation present in the sample. Consequently, coamplification at low denaturation (COLD)-PCR-based methods have been developed and combine a high analytical sensitivity due to mutation enrichment in the sample with the identification of known or unknown mutations by downstream sequencing experiments. Among these methods, the recently developed Enhanced-ice-COLD-PCR appeared as the most powerful method as it outperformed the other COLD-PCR-based methods in terms of the mutation enrichment and due to the simplicity of the experimental setup of the assay. Indeed, E-ice-COLD-PCR is very versatile as it can be used on all types of PCR platforms and is applicable to different types of samples including fresh frozen, FFPE, and plasma samples. The technique relies on the incorporation of an LNA containing blocker probe in the PCR reaction followed by selective heteroduplex denaturation enabling amplification of the mutant allele while amplification of the wild-type allele is prevented. Combined with Pyrosequencing(®), which is a very quantitative high-resolution sequencing technology, E-ice-COLD-PCR can detect and identify mutations with a limit of detection down to 0.01 %.
Fantin, Yuri S.; Neverov, Alexey D.; Favorov, Alexander V.; Alvarez-Figueroa, Maria V.; Braslavskaya, Svetlana I.; Gordukova, Maria A.; Karandashova, Inga V.; Kuleshov, Konstantin V.; Myznikova, Anna I.; Polishchuk, Maya S.; Reshetov, Denis A.; Voiciehovskaya, Yana A.; Mironov, Andrei A.; Chulanov, Vladimir P.
2013-01-01
Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3–14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing. PMID:23382983
2013-01-01
Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217
Cholley, Pascal; Stojanov, Milos; Hocquet, Didier; Thouverez, Michelle; Bertrand, Xavier; Blanc, Dominique S
2015-08-01
Reliable molecular typing methods are necessary to investigate the epidemiology of bacterial pathogens. Reference methods such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE) are costly and time consuming. Here, we compared our newly developed double-locus sequence typing (DLST) method for Pseudomonas aeruginosa to MLST and PFGE on a collection of 281 isolates. DLST was as discriminatory as MLST and was able to recognize "high-risk" epidemic clones. Both methods were highly congruent. Not surprisingly, a higher discriminatory power was observed with PFGE. In conclusion, being a simple method (single-strand sequencing of only 2 loci), DLST is valuable as a first-line typing tool for epidemiological investigations of P. aeruginosa. Coupled to a more discriminant method like PFGE or whole genome sequencing, it might represent an efficient typing strategy to investigate or prevent outbreaks. Copyright © 2015 Elsevier Inc. All rights reserved.
Discovery of a bovine enterovirus in alpaca.
McClenahan, Shasta D; Scherba, Gail; Borst, Luke; Fredrickson, Richard L; Krause, Philip R; Uhlenhaut, Christine
2013-01-01
A cytopathic virus was isolated using Madin-Darby bovine kidney (MDBK) cells from lung tissue of alpaca that died of a severe respiratory infection. To identify the virus, the infected cell culture supernatant was enriched for virus particles and a generic, PCR-based method was used to amplify potential viral sequences. Genomic sequence data of the alpaca isolate was obtained and compared with sequences of known viruses. The new alpaca virus sequence was most similar to recently designated Enterovirus species F, previously bovine enterovirus (BEVs), viruses that are globally prevalent in cattle, although they appear not to cause significant disease. Because bovine enteroviruses have not been previously reported in U.S. alpaca, we suspect that this type of infection is fairly rare, and in this case appeared not to spread beyond the original outbreak. The capsid sequence of the detected virus had greatest homology to Enterovirus F type 1 (indicating that the virus should be considered a member of serotype 1), but the virus had greater homology in 2A protease sequence to type 3, suggesting that it may have been a recombinant. Identifying pathogens that infect a new host species for the first time can be challenging. As the disease in a new host species may be quite different from that in the original or natural host, the pathogen may not be suspected based on the clinical presentation, delaying diagnosis. Although this virus replicated in MDBK cells, existing standard culture and molecular methods could not identify it. In this case, a highly sensitive generic PCR-based pathogen-detection method was used to identify this pathogen.
Discovery of a Bovine Enterovirus in Alpaca
McClenahan, Shasta D.; Scherba, Gail; Borst, Luke; Fredrickson, Richard L.; Krause, Philip R.; Uhlenhaut, Christine
2013-01-01
A cytopathic virus was isolated using Madin-Darby bovine kidney (MDBK) cells from lung tissue of alpaca that died of a severe respiratory infection. To identify the virus, the infected cell culture supernatant was enriched for virus particles and a generic, PCR-based method was used to amplify potential viral sequences. Genomic sequence data of the alpaca isolate was obtained and compared with sequences of known viruses. The new alpaca virus sequence was most similar to recently designated Enterovirus species F, previously bovine enterovirus (BEVs), viruses that are globally prevalent in cattle, although they appear not to cause significant disease. Because bovine enteroviruses have not been previously reported in U.S. alpaca, we suspect that this type of infection is fairly rare, and in this case appeared not to spread beyond the original outbreak. The capsid sequence of the detected virus had greatest homology to Enterovirus F type 1 (indicating that the virus should be considered a member of serotype 1), but the virus had greater homology in 2A protease sequence to type 3, suggesting that it may have been a recombinant. Identifying pathogens that infect a new host species for the first time can be challenging. As the disease in a new host species may be quite different from that in the original or natural host, the pathogen may not be suspected based on the clinical presentation, delaying diagnosis. Although this virus replicated in MDBK cells, existing standard culture and molecular methods could not identify it. In this case, a highly sensitive generic PCR-based pathogen-detection method was used to identify this pathogen. PMID:23950875
Gene sequence analyses and other DNA-based methods for yeast species recognition
USDA-ARS?s Scientific Manuscript database
DNA sequence analyses, as well as other DNA-based methodologies, have transformed the way in which yeasts are identified. The focus of this chapter will be on the resolution of species using various types of DNA comparisons. In other chapters in this book, Rozpedowska, Piškur and Wolfe discuss mul...
Rafei, Rayane; Dabboussi, Fouad; Hamze, Monzer; Eveillard, Matthieu; Lemarié, Carole; Gaultier, Marie-Pierre; Mallat, Hassan; Moghnieh, Rima; Husni-Samaha, Rola; Joly-Guillou, Marie-Laure; Kempf, Marie
2014-01-01
This study analyzed 42 Acinetobacter baumannii strains collected between 2009-2012 from different hospitals in Beyrouth and North Lebanon to better understand the epidemiology and carbapenem resistance mechanisms in our collection and to compare the robustness of pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), repetitive sequence-based PCR (rep-PCR) and blaOXA-51 sequence-based typing (SBT). Among 31 carbapenem resistant strains, we have detected three carbapenem resistance genes: 28 carried the blaOXA-23 gene, 1 the blaOXA-24 gene and 2 strains the blaOXA-58 gene. This is the first detection of blaOXA-23 and blaOXA-24 in Lebanon. PFGE identified 11 types and was the most discriminating technique followed by rep-PCR (9 types), blaOXA-51 SBT (8 types) and MLST (7 types). The PFGE type A'/ST2 was the dominant genotype in our collection present in Beyrouth and North Lebanon. The clustering agreement between all techniques was measured by adjust Wallace coefficient. An overall agreement has been demonstrated. High values of adjust Wallace coefficient were found with followed combinations: PFGE to predict MLST types = 100%, PFGE to predict blaOXA-51 SBT = 100%, blaOXA-51 SBT to predict MLST = 100%, MLST to predict blaOXA-51 SBT = 84.7%, rep-PCR to predict MLST = 81.5%, PFGE to predict rep-PCR = 69% and rep-PCR to predict blaOXA-51 SBT = 67.2%. PFGE and MLST are gold standard methods for outbreaks investigation and population structure studies respectively. Otherwise, these two techniques are technically, time and cost demanding. We recommend the use of blaOXA-51 SBT as first typing method to screen isolates and assign them to their corresponding clonal lineages. Repetitive sequence-based PCR is a rapid tool to access outbreaks but careful interpretation of results must be always performed.
Efficient engineering of a bacteriophage genome using the type I-E CRISPR-Cas system.
Kiro, Ruth; Shitrit, Dror; Qimron, Udi
2014-01-01
The clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) system has recently been used to engineer genomes of various organisms, but surprisingly, not those of bacteriophages (phages). Here we present a method to genetically engineer the Escherichia coli phage T7 using the type I-E CRISPR-Cas system. T7 phage genome is edited by homologous recombination with a DNA sequence flanked by sequences homologous to the desired location. Non-edited genomes are targeted by the CRISPR-Cas system, thus enabling isolation of the desired recombinant phages. This method broadens CRISPR Cas-based editing to phages and uses a CRISPR-Cas type other than type II. The method may be adjusted to genetically engineer any bacteriophage genome.
2017-01-01
The Gram-positive, anaerobic bacterium Propionibacterium acnes forms part of the normal microbiota on human skin and mucosal surfaces. While normally associated with skin health, P. acnes is also an opportunistic pathogen linked with a range of human infections and clinical conditions. Over the last decade, our knowledge of the intraspecies phylogenetics and taxonomy of this bacterium has increased tremendously due to the introduction of DNA typing schemes based on single and multiple gene loci, as well as whole genomes. Furthermore, this work has led to the identification of specific lineages associated with skin health and human disease. In this review we will look back at the introduction of DNA sequence typing of P. acnes based on recA and tly loci, and then describe how these methods provided a basic understanding of the population genetic structure of the bacterium, and even helped characterize the grapevine-associated lineage of P. acnes, known as P. acnes type Zappe, which appears to have undergone a host switch from humans-to-plants. Particular limitations of recA and tly sequence typing will also be presented, as well as a detailed discussion of more recent, higher resolution, DNA-based methods to type P. acnes and investigate its evolutionary history in greater detail. PMID:29267255
Old, M O; Logan, L H; Maldonado, Y A
1997-11-01
Sabin type 3 polio vaccine virus is the most common cause of poliovaccine associated paralytic poliomyelitis. Vaccine associated paralytic poliomyelitis cases have been associated with Sabin type 3 revertants containing a single U to C substitution at bp 472 of Sabin type 3. A rapid method of identification of Sabin type 3 bp 472 mutants is described. An enterovirus group-specific probe for use in a chemiluminescent dot blot hybridization assay was developed to identify enterovirus positive viral lysates. A reverse transcription-polymerase chain reaction (RT-PCR) assay producing a 319 bp PCR product containing the Sabin type 3 bp 472 mutation site was then employed to identify Sabin type 3 isolates. Chemiluminescent nucleic acid cycle sequencing of the purified 319 bp PCR product was then employed to identify nucleic acid sequences at bp 472. The enterovirus group probe hybridization procedure and isolation of the Sabin type 3 PCR product were highly sensitive and specific; nucleic acid cycle sequencing corresponded to the known sequence of stock Sabin type 3 isolates. These methods will be used to identify the Sabin type 3 reversion rate from sequential stool samples of infants obtained after the first and second doses of oral poliovirus vaccine.
Tong, Steven Y C; Xie, Shirley; Richardson, Leisha J; Ballard, Susan A; Dakh, Farshid; Grabsch, Elizabeth A; Grayson, M Lindsay; Howden, Benjamin P; Johnson, Paul D R; Giffard, Philip M
2011-01-01
We have developed a single nucleotide polymorphism (SNP) nucleated high-resolution melting (HRM) technique to genotype Enterococcus faecium. Eight SNPs were derived from the E. faecium multilocus sequence typing (MLST) database and amplified fragments containing these SNPs were interrogated by HRM. We tested the HRM genotyping scheme on 85 E. faecium bloodstream isolates and compared the results with MLST, pulsed-field gel electrophoresis (PFGE) and an allele specific real-time PCR (AS kinetic PCR) SNP typing method. In silico analysis based on predicted HRM curves according to the G+C content of each fragment for all 567 sequence types (STs) in the MLST database together with empiric data from the 85 isolates demonstrated that HRM analysis resolves E. faecium into 231 "melting types" (MelTs) and provides a Simpson's Index of Diversity (D) of 0.991 with respect to MLST. This is a significant improvement on the AS kinetic PCR SNP typing scheme that resolves 61 SNP types with D of 0.95. The MelTs were concordant with the known ST of the isolates. For the 85 isolates, there were 13 PFGE patterns, 17 STs, 14 MelTs and eight SNP types. There was excellent concordance between PFGE, MLST and MelTs with Adjusted Rand Indices of PFGE to MelT 0.936 and ST to MelT 0.973. In conclusion, this HRM based method appears rapid and reproducible. The results are concordant with MLST and the MLST based population structure.
Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N
2016-01-01
For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
2017-01-01
Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described. PMID:28542338
Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi
2017-01-01
Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.
Multistage morphological segmentation of bright-field and fluorescent microscopy images
NASA Astrophysics Data System (ADS)
Korzyńska, A.; Iwanowski, M.
2012-06-01
This paper describes the multistage morphological segmentation method (MSMA) for microscopic cell images. The proposed method enables us to study the cell behaviour by using a sequence of two types of microscopic images: bright field images and/or fluorescent images. The proposed method is based on two types of information: the cell texture coming from the bright field images and intensity of light emission, done by fluorescent markers. The method is dedicated to the image sequences segmentation and it is based on mathematical morphology methods supported by other image processing techniques. The method allows for detecting cells in image independently from a degree of their flattening and from presenting structures which produce the texture. It makes use of some synergic information from the fluorescent light emission image as the support information. The MSMA method has been applied to images acquired during the experiments on neural stem cells as well as to artificial images. In order to validate the method, two types of errors have been considered: the error of cell area detection and the error of cell position using artificial images as the "gold standard".
Development of a Multiplex Single Base Extension Assay for Mitochondrial DNA Haplogroup Typing
Nelson, Tahnee M.; Just, Rebecca S.; Loreille, Odile; Schanfield, Moses S.; Podini, Daniele
2007-01-01
Aim To provide a screening tool to reduce time and sample consumption when attempting mtDNA haplogroup typing. Methods A single base primer extension assay was developed to enable typing, in a single reaction, of twelve mtDNA haplogroup specific polymorphisms. For validation purposes a total of 147 samples were tested including 73 samples successfully haplogroup typed using mtDNA control region (CR) sequence data, 21 samples inconclusively haplogroup typed by CR data, 20 samples previously haplogroup typed using restriction fragment length polymorphism (RFLP) analysis, and 31 samples of known ancestral origin without previous haplogroup typing. Additionally, two highly degraded human bones embalmed and buried in the early 1950s were analyzed using the single nucleotide polymorphisms (SNP) multiplex. Results When the SNP multiplex was used to type the 96 previously CR sequenced specimens, an increase in haplogroup or macrohaplogroup assignment relative to conventional CR sequence analysis was observed. The single base extension assay was also successfully used to assign a haplogroup to decades-old, embalmed skeletal remains dating to World War II. Conclusion The SNP multiplex was successfully used to obtain haplogroup status of highly degraded human bones, and demonstrated the ability to eliminate possible contributors. The SNP multiplex provides a low-cost, high throughput method for typing of mtDNA haplogroups A, B, C, D, E, F, G, H, L1/L2, L3, M, and N that could be useful for screening purposes for human identification efforts and anthropological studies. PMID:17696300
Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian
2017-01-01
The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916
Wang, Tao; Li, Hua; Wang, Hua; Su, Jing
2015-04-16
The present study established a typing method with NotI-based pulsed-field gel electrophoresis (PFGE) and stress response gene schemed multilocus sequence typing (MLST) for 55 Oenococcus oeni strains isolated from six individual regions in China and two model strains PSU-1 (CP000411) and ATCC BAA-1163 (AAUV00000000). Seven stress response genes, cfa, clpL, clpP, ctsR, mleA, mleP and omrA, were selected for MLST testing, and positive selective pressure was detected for these genes. Furthermore, both methods separated the strains into two clusters. The PFGE clusters are correlated with the region, whereas the sequence types (STs) formed by the MLST confirm the two clusters identified by PFGE. In addition, the population structure was a mixture of evolutionary pathways, and the strains exhibited both clonal and panmictic characteristics. Copyright © 2015 Elsevier B.V. All rights reserved.
Buckley, Mike
2016-03-24
Collagen is one of the most ubiquitous proteins in the animal kingdom and the dominant protein in extracellular tissues such as bone, skin and other connective tissues in which it acts primarily as a supporting scaffold. It has been widely investigated scientifically, not only as a biomedical material for regenerative medicine, but also for its role as a food source for both humans and livestock. Due to the long-term stability of collagen, as well as its abundance in bone, it has been proposed as a source of biomarkers for species identification not only for heat- and pressure-rendered animal feed but also in ancient archaeological and palaeontological specimens, typically carried out by peptide mass fingerprinting (PMF) as well as in-depth liquid chromatography (LC)-based tandem mass spectrometric methods. Through the analysis of the three most common domesticates species, cow, sheep, and pig, this research investigates the advantages of each approach over the other, investigating sites of sequence variation with known functional properties of the collagen molecule. Results indicate that the previously identified species biomarkers through PMF analysis are not among the most variable type 1 collagen peptides present in these tissues, the latter of which can be detected by LC-based methods. However, it is clear that the highly repetitive sequence motif of collagen throughout the molecule, combined with the variability of the sites and relative abundance levels of hydroxylation, can result in high scoring false positive peptide matches using these LC-based methods. Additionally, the greater alpha 2(I) chain sequence variation, in comparison to the alpha 1(I) chain, did not appear to be specific to any particular functional properties, implying that intra-chain functional constraints on sequence variation are not as great as inter-chain constraints. However, although some of the most variable peptides were only observed in LC-based methods, until the range of publicly available collagen sequences improves, the simplicity of the PMF approach and suitable range of peptide sequence variation observed makes it the ideal method for initial taxonomic identification prior to further analysis by LC-based methods only when required.
2006-01-01
isolated using a routine salting-out method (DNA E-Z Prepkit, Orchid Diagnostics Europe, St Katelijne Waver, Belgium). Sequence based typing In...electrophoresis using ethidiumbromide to show the single 2 KB band before sequencing. Next, sequencing reactions were performed separately for exons 2, 3...Multiplex reverse transcription-polymerase chain reaction for simultaneous screening of 29 translocations and chromosomal aberrations in acute
Adamiak, Paul; Vanderkooi, Otto G; Kellner, James D; Schryvers, Anthony B; Bettinger, Julie A; Alcantara, Joenel
2014-06-03
Multi-locus sequence typing (MLST) is a portable, broadly applicable method for classifying bacterial isolates at an intra-species level. This methodology provides clinical and scientific investigators with a standardized means of monitoring evolution within bacterial populations. MLST uses the DNA sequences from a set of genes such that each unique combination of sequences defines an isolate's sequence type. In order to reliably determine the sequence of a typing gene, matching sequence reads for both strands of the gene must be obtained. This study assesses the ability of both the standard, and an alternative set of, Streptococcus pneumoniae MLST primers to completely sequence, in both directions, the required typing alleles. The results demonstrated that for five (aroE, recP, spi, xpt, ddl) of the seven S. pneumoniae typing alleles, the standard primers were unable to obtain the complete forward and reverse sequences. This is due to the standard primers annealing too closely to the target regions, and current sequencing technology failing to sequence the bases that are too close to the primer. The alternative primer set described here, which includes a combination of primers proposed by the CDC and several designed as part of this study, addresses this limitation by annealing to highly conserved segments further from the target region. This primer set was subsequently employed to sequence type 105 S. pneumoniae isolates collected by the Canadian Immunization Monitoring Program ACTive (IMPACT) over a period of 18 years. The inability of several of the standard S. pneumoniae MLST primers to fully sequence the required region was consistently observed and is the result of a shift in sequencing technology occurring after the original primers were designed. The results presented here introduce clear documentation describing this phenomenon into the literature, and provide additional guidance, through the introduction of a widely validated set of alternative primers, to research groups seeking to undertake S. pneumoniae MLST based studies.
NASA Astrophysics Data System (ADS)
Jian, Le; Cao, Wang; Jintao, Yang; Yinge, Wang
2018-04-01
This paper describes the design of a dynamic voltage restorer (DVR) that can simultaneously protect several sensitive loads from voltage sags in a region of an MV distribution network. A novel reference voltage calculation method based on zero-sequence voltage optimisation is proposed for this DVR to optimise cost-effectiveness in compensation of voltage sags with different characteristics in an ungrounded neutral system. Based on a detailed analysis of the characteristics of voltage sags caused by different types of faults and the effect of the wiring mode of the transformer on these characteristics, the optimisation target of the reference voltage calculation is presented with several constraints. The reference voltages under all types of voltage sags are calculated by optimising the zero-sequence component, which can reduce the degree of swell in the phase-to-ground voltage after compensation to the maximum extent and can improve the symmetry degree of the output voltages of the DVR, thereby effectively increasing the compensation ability. The validity and effectiveness of the proposed method are verified by simulation and experimental results.
Smith, J K; Parry, J D; Day, J G; Smith, R J
1998-10-01
The use of primers based on the Hip1 sequence as a typing technique for cyanobacteria has been investigated. The discovery of short repetitive sequence structures in bacterial DNA during the last decade has led to the development of PCR-based methods for typing, i.e., distinguishing and identifying, bacterial species and strains. An octameric palindromic sequence known as Hip1 has been shown to be present in the chromosomal DNA of many species of cyanobacteria as a highly repetitious interspersed sequence. PCR primers were constructed that extended the Hip1 sequence at the 3' end by two bases. Five of the 16 possible extended primers were tested. Each of the five primers produced a different set of products when used to prime PCR from cyanobacterial genomic DNA. Each primer produced a distinct set of products for each of the 15 cyanobacterial species tested. The ability of Hip1-based PCR to resolve taxonomic differences was assessed by analysis of independent isolates of Anabaena flos-aquae and Nostoc ellipsosporum obtained from the CCAP (Culture Collection of Algae and Protozoa, IFE, Cumbria, UK). A PCR-based RFLP analysis of products amplified from the 23S-16S rDNA intergenic region was used to characterize the isolates and to compare with the Hip1 typing data. The RFLP and Hip1 typing yielded similar results and both techniques were able to distinguish different strains. On the basis of these results it is suggested that the Hip1 PCR technique may assist in distinguishing cyanobacterial species and strains.
ERIC Educational Resources Information Center
Soffree-Cady, Flore
To provide a writing pedagogy grounded in theory, a teaching method was developed which sequenced certain types of assignments. The classification of types and the organizational structure of the sequences were based on a teaching model that draws upon theories from various disciplines. Although the teaching activities are not new in themselves,…
Goldstone, Robert J.; McLuckie, Joyce; Smith, David G. E.
2015-01-01
Typing of Mycobacterium avium subspecies paratuberculosis strains presents a challenge, since they are genetically monomorphic and traditional molecular techniques have limited discriminatory power. The recent advances and availability of whole-genome sequencing have extended possibilities for the characterization of Mycobacterium avium subspecies paratuberculosis, and whole-genome sequencing can provide a phylogenetic context to facilitate global epidemiology studies. In this study, we developed a single nucleotide polymorphism (SNP) assay based on PCR and restriction enzyme digestion or sequencing of the amplified product. The SNP analysis was performed using genome sequence data from 133 Mycobacterium avium subspecies paratuberculosis isolates with different genotypes from 8 different host species and 17 distinct geographic regions around the world. A total of 28,402 SNPs were identified among all of the isolates. The minimum number of SNPs required to distinguish between all of the 133 genomes was 93 and between only the type C isolates was 41. To reduce the number of SNPs and PCRs required, we adopted an approach based on sequential detection of SNPs and a decision tree. By the analysis of 14 SNPs Mycobacterium avium subspecies paratuberculosis isolates can be characterized within 14 phylogenetic groups with a higher discriminatory power than mycobacterial interspersed repetitive unit–variable number tandem repeat assay and other typing methods. Continuous updating of genome sequences is needed in order to better characterize new phylogenetic groups and SNP profiles. The novel SNP assay is a discriminative, simple, reproducible method and requires only basic laboratory equipment for the large-scale global typing of Mycobacterium avium subspecies paratuberculosis isolates. PMID:26677250
NASA Astrophysics Data System (ADS)
Esteban, Pere; Beck, Christoph; Philipp, Andreas
2010-05-01
Using data associated with accidents or damages caused by snow avalanches over the eastern Pyrenees (Andorra and Catalonia) several atmospheric circulation type catalogues have been obtained. For this purpose, different circulation type classification methods based on Principal Component Analysis (T-mode and S-mode using the extreme scores) and on optimization procedures (Improved K-means and SANDRA) were applied . Considering the characteristics of the phenomena studied, not only single day circulation patterns were taken into account but also sequences of circulation types of varying length. Thus different classifications with different numbers of types and for different sequence lengths were obtained using the different classification methods. Simple between type variability, within type variability, and outlier detection procedures have been applied for selecting the best result concerning snow avalanches type classifications. Furthermore, days without occurrence of the hazards were also related to the avalanche centroids using pattern-correlations, facilitating the calculation of the anomalies between hazardous and no hazardous days, and also frequencies of occurrence of hazardous events for each circulation type. Finally, the catalogues statistically considered the best results are evaluated using the avalanche forecaster expert knowledge. Consistent explanation of snow avalanches occurrence by means of circulation sequences is obtained, but always considering results from classifications with different sequence length. This work has been developed in the framework of the COST Action 733 (Harmonisation and Applications of Weather Type Classifications for European regions).
Wang, Li; Yokoyama, Koji; Miyaji, Makoto; Nishimura, Kazuko
2001-01-01
We analyzed a 402-bp sequence of the mitochondrial cytochrome b gene of 34 strains of Exophiala jeanselmei and 16 strains representing 12 related species. The strains of E. jeanselmei were classified into 20 DNA types and 17 amino acid types. The differences between these strains were found in 1 to 60 nucleotides and 1 to 17 amino acids. On the basis of the identities and similarities of nucleotide and amino acid sequences, some strains were reidentified: i.e., two strains of E. jeanselmei var. hetermorpha and one strain of E. castellanii as E. dermatitidis (including the type strain), three strains of E. jeanselmei as E. jeanselmei var. lecanii-corni (including the type strain), three strains of E. jeanselmei as E. bergeri (including the type strain), seven strains of E. jeanselmei as E. pisciphila (including the type strain), seven strains of E. jeanselmei as E. jeanselmei var. jeanselmei (including the type strain), one strain of E. jeanselmei as Fonsecaea pedrosoi (including the type strain), and one strain of E. jeanselmei as E. spinifera (including the type strain). Some E. jeanselmei strains showed distinct nucleotide and amino acid sequences. The amino-acid-based UPGMA (unweighted pair group method with the arithmetic mean) tree exhibited nearly the same topology as those of the DNA-based trees obtained by neighbor joining, maximum parsimony, and maximum likelihood methods. PMID:11724862
Meisal, Roger; Rounge, Trine Ballestad; Christiansen, Irene Kraus; Eieland, Alexander Kirkeby; Worren, Merete Molton; Molden, Tor Faksvaag; Kommedal, Øyvind; Hovig, Eivind; Leegaard, Truls Michael
2017-01-01
Sensitive and specific genotyping of human papillomaviruses (HPVs) is important for population-based surveillance of carcinogenic HPV types and for monitoring vaccine effectiveness. Here we compare HPV genotyping by Next Generation Sequencing (NGS) to an established DNA hybridization method. In DNA isolated from urine, the overall analytical sensitivity of NGS was found to be 22% higher than that of hybridization. NGS was also found to be the most specific method and expanded the detection repertoire beyond the 37 types of the DNA hybridization assay. Furthermore, NGS provided an increased resolution by identifying genetic variants of individual HPV types. The same Modified General Primers (MGP)-amplicon was used in both methods. The NGS method is described in detail to facilitate implementation in the clinical microbiology laboratory and includes suggestions for new standards for detection and calling of types and variants with improved resolution. PMID:28045981
Meisal, Roger; Rounge, Trine Ballestad; Christiansen, Irene Kraus; Eieland, Alexander Kirkeby; Worren, Merete Molton; Molden, Tor Faksvaag; Kommedal, Øyvind; Hovig, Eivind; Leegaard, Truls Michael; Ambur, Ole Herman
2017-01-01
Sensitive and specific genotyping of human papillomaviruses (HPVs) is important for population-based surveillance of carcinogenic HPV types and for monitoring vaccine effectiveness. Here we compare HPV genotyping by Next Generation Sequencing (NGS) to an established DNA hybridization method. In DNA isolated from urine, the overall analytical sensitivity of NGS was found to be 22% higher than that of hybridization. NGS was also found to be the most specific method and expanded the detection repertoire beyond the 37 types of the DNA hybridization assay. Furthermore, NGS provided an increased resolution by identifying genetic variants of individual HPV types. The same Modified General Primers (MGP)-amplicon was used in both methods. The NGS method is described in detail to facilitate implementation in the clinical microbiology laboratory and includes suggestions for new standards for detection and calling of types and variants with improved resolution.
Petzold, Markus; Prior, Karola; Moran-Gilad, Jacob; Harmsen, Dag; Lück, Christian
2017-01-01
Introduction Whole genome sequencing (WGS) is increasingly used in Legionnaires’ disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila. Methods: We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results: Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion: The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak. PMID:29162202
Petzold, Markus; Prior, Karola; Moran-Gilad, Jacob; Harmsen, Dag; Lück, Christian
2017-11-01
IntroductionWhole genome sequencing (WGS) is increasingly used in Legionnaires' disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila . Methods : We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results : Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion : The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak.
Identifying metabolic enzymes with multiple types of association evidence
Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M
2006-01-01
Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities. PMID:16571130
Liu, Fenyun; Kariyawasam, Subhashinie; Jayarao, Bhushan M; Barrangou, Rodolphe; Gerner-Smidt, Peter; Ribot, Efrain M; Knabel, Stephen J; Dudley, Edward G
2011-07-01
Salmonella enterica subsp. enterica serovar Enteritidis is a major cause of food-borne salmonellosis in the United States. Two major food vehicles for S. Enteritidis are contaminated eggs and chicken meat. Improved subtyping methods are needed to accurately track specific strains of S. Enteritidis related to human salmonellosis throughout the chicken and egg food system. A sequence typing scheme based on virulence genes (fimH and sseL) and clustered regularly interspaced short palindromic repeats (CRISPRs)-CRISPR-including multi-virulence-locus sequence typing (designated CRISPR-MVLST)-was used to characterize 35 human clinical isolates, 46 chicken isolates, 24 egg isolates, and 63 hen house environment isolates of S. Enteritidis. A total of 27 sequence types (STs) were identified among the 167 isolates. CRISPR-MVLST identified three persistent and predominate STs circulating among U.S. human clinical isolates and chicken, egg, and hen house environmental isolates in Pennsylvania, and an ST that was found only in eggs and humans. It also identified a potential environment-specific sequence type. Moreover, cluster analysis based on fimH and sseL identified a number of clusters, of which several were found in more than one outbreak, as well as 11 singletons. Further research is needed to determine if CRISPR-MVLST might help identify the ecological origins of S. Enteritidis strains that contaminate chickens and eggs.
Whiley, David M; Jacob, Kevin; Nakos, Jennifer; Bletchly, Cheryl; Nimmo, Graeme R; Nissen, Michael D; Sloots, Theo P
2012-06-01
Numerous real-time PCR assays have been described for detection of the influenza A H275Y alteration. However, the performance of these methods can be undermined by sequence variation in the regions flanking the codon of interest. This is a problem encountered more broadly in microbial diagnostics. In this study, we developed a modification of hybridization probe-based melting curve analysis, whereby primers are used to mask proximal mutations in the sequence targets of hybridization probes, so as to limit the potential for sequence variation to interfere with typing. The approach was applied to the H275Y alteration of the influenza A (H1N1) 2009 strain, as well as a Neisseria gonorrhoeae mutation associated with antimicrobial resistance. Assay performances were assessed using influenza A and N. gonorrhoeae strains characterized by DNA sequencing. The modified hybridization probe-based approach proved successful in limiting the effects of proximal mutations, with the results of melting curve analyses being 100% consistent with the results of DNA sequencing for all influenza A and N. gonorrhoeae strains tested. Notably, these included influenza A and N. gonorrhoeae strains exhibiting additional mutations in hybridization probe targets. Of particular interest was that the H275Y assay correctly typed influenza A strains harbouring a T822C nucleotide substitution, previously shown to interfere with H275Y typing methods. Overall our modified hybridization probe-based approach provides a simple means of circumventing problems caused by sequence variation, and offers improved detection of the influenza A H275Y alteration and potentially other resistance mechanisms.
Xu, Weijia; Ozer, Stuart; Gutell, Robin R
2009-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Xu, Weijia; Ozer, Stuart; Gutell, Robin R.
2010-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
Kono, H; Saven, J G
2001-02-23
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
An efficient study design to test parent-of-origin effects in family trios.
Yu, Xiaobo; Chen, Gao; Feng, Rui
2017-11-01
Increasing evidence has shown that genes may cause prenatal, neonatal, and pediatric diseases depending on their parental origins. Statistical models that incorporate parent-of-origin effects (POEs) can improve the power of detecting disease-associated genes and help explain the missing heritability of diseases. In many studies, children have been sequenced for genome-wide association testing. But it may become unaffordable to sequence their parents and evaluate POEs. Motivated by the reality, we proposed a budget-friendly study design of sequencing children and only genotyping their parents through single nucleotide polymorphism array. We developed a powerful likelihood-based method, which takes into account both sequence reads and linkage disequilibrium to infer the parental origins of children's alleles and estimate their POEs on the outcome. We evaluated the performance of our proposed method and compared it with an existing method using only genotypes, through extensive simulations. Our method showed higher power than the genotype-based method. When either the mean read depth or the pair-end length was reasonably large, our method achieved ideal power. When single parents' genotypes were unavailable or parental genotypes at the testing locus were not typed, both methods lost power compared with when complete data were available; but the power loss from our method was smaller than the genotype-based method. We also extended our method to accommodate mixed genotype, low-, and high-coverage sequence data from children and their parents. At presence of sequence errors, low-coverage parental sequence data may lead to lower power than parental genotype data. © 2017 WILEY PERIODICALS, INC.
Mohanta, Uday Kumar; Ichikawa-Seki, Madoka; Shoriki, Takuya; Katakura, Ken; Itagaki, Tadashi
2014-07-01
This study aimed to precisely discriminate Fasciola spp. based on DNA sequences of nuclear internal transcribed spacer 1 (ITS1) and mitochondrial nicotinamide adenine dinucleotide (NADH) dehydrogenase subunit 1 (nad1) gene. We collected 150 adult flukes from the bile ducts of cattle, buffaloes, sheep, and goats from six different regions of Bangladesh. Spermatogenic status was determined by analyzing stained seminal vesicles. The ITS1 types were analyzed using the polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) method. The nad1 haplotypes were identified based on PCR and direct sequencing and analyzed phylogenetically by comparing with nad1 haplotypes of Fasciola spp. from other Asian countries. Of the 127 aspermic flukes, 98 were identified as Fg type in ITS1, whereas 29 were identified as Fh/Fg type, indicating a combination of ITS1 sequences of Fasciola hepatica and Fasciola gigantica. All the 127 aspermic flukes showed Fsp-NDI-Bd11 in nad1 haplotype with nucleotide sequences identical to aspermic Fasciola sp. from Asian countries. Further, 20 spermic flukes were identified as F. gigantica based on their spermatogenic status and Fg type in ITS1. F. gigantica population was thought to be introduced into Bangladesh considerably earlier than the aspermic Fasciola sp. because 11 haplotypes with high haplotype diversity were detected from the F. gigantica population. However, three flukes from Bangladesh could not be precisely identified, because their spermatogenic status, ITS1 types, and nad1 haplotypes were ambiguous. Therefore, developing a robust method to distinguish aspermic Fasciola sp. from other Fasciola species is necessary in the future.
Method for isolating chromosomal DNA in preparation for hybridization in suspension
Lucas, Joe N.
2000-01-01
A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. Chromosomal DNA in a sample containing cell debris is prepared for hybridization in suspension by treating the mixture with RNase. The treated DNA can also be fixed prior to hybridization.
Quero, Sara; García-Núñez, Marian; Párraga-Niño, Noemí; Barrabeig, Irene; Pedro-Botet, Maria L; de Simon, Mercè; Sopena, Nieves; Sabrià, Miquel
2016-06-01
To compare the discriminatory power of pulsed-field gel electrophoresis (PFGE) and sequence-based typing (SBT) in Legionella outbreaks for determining the infection source. Twenty-five investigations of Legionnaires' disease were analyzed by PFGE, SBT and Dresden monoclonal antibody. The results suggested that monoclonal antibody could reduce the number of Legionella isolates to be characterized by molecular methods. The epidemiological concordance PFGE-SBT was 100%, while the molecular concordance was 64%. Adjusted Wallace index (AW) showed that PFGE has better discriminatory power than SBT (AWSBT→PFGE = 0.767; AWPFGE→SBT = 1). The discrepancies appeared mostly in sequence type (ST) 1, a worldwide distributed ST for which PFGE discriminated different profiles. SBT discriminatory power was not sufficient verifying the infection source, especially in worldwide distributed STs, which were classified into different PFGE patterns.
Single-Molecule Electrical Random Resequencing of DNA and RNA
NASA Astrophysics Data System (ADS)
Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji
2012-07-01
Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.
Lucas, J.N.; Straume, T.; Bogen, K.T.
1998-03-24
A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.
Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.
1998-01-01
A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.
Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?
Robins-Browne, Roy M.; Holt, Kathryn E.; Ingle, Danielle J.; Hocking, Dianna M.; Yang, Ji; Tauschek, Marija
2016-01-01
The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods. PMID:27917373
Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?
Robins-Browne, Roy M; Holt, Kathryn E; Ingle, Danielle J; Hocking, Dianna M; Yang, Ji; Tauschek, Marija
2016-01-01
The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E .coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli . Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli , which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.
Method for sequencing nucleic acid molecules
Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu
2006-06-06
The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Method for sequencing nucleic acid molecules
Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu
2006-05-30
The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Changing practice: red blood cell typing by molecular methods for patients with sickle cell disease.
Casas, Jessica; Friedman, David F; Jackson, Tannoa; Vege, Sunitha; Westhoff, Connie M; Chou, Stella T
2015-06-01
Extended red blood cell (RBC) antigen matching is recommended to limit alloimmunization in patients with sickle cell disease (SCD). DNA-based testing to predict blood group phenotypes has enhanced availability of antigen-negative donor units and improved typing of transfused patients, but replacement of routine serologic typing for non-ABO antigens with molecular typing for patients has not been reported. This study compared the historical RBC antigen phenotypes obtained by hemagglutination methods with genotype predictions in 494 patients with SCD. For discrepant results, repeat serologic testing was performed and/or investigated by gene sequencing for silent or variant alleles. Seventy-one typing discrepancies were identified among 6360 antigen comparisons (1.1%). New specimens for repeat serologic testing were obtained for 66 discrepancies and retyping agreed with the genotype in 64 cases. One repeat Jk(b-) serologic phenotype, predicted Jk(b+) by genotype, was found by direct sequencing of JK to be a silenced allele, and one N typing discrepancy remains under investigation. Fifteen false-negative serologic results were associated with alleles encoding weak antigens or single-dose Fy(b) expression. DNA-based RBC typing provided improved accuracy and expanded information on RBC antigens compared to hemagglutination methods, leading to its implementation as the primary method for extended RBC typing for patients with SCD at our institution. © 2015 AABB.
Bovine Papillomavirus in Brazil: Detection of Coinfection of Unusual Types by a PCR-RFLP Method
Carvalho, R. F.; Sakata, S. T.; Giovanni, D. N. S.; Mori, E.; Brandão, P. E.; Richtzenhain, L. J.; Pozzi, C. R.; Arcaro, J. R. P.; Miranda, M. S.; Mazzuchelli-de-Souza, J.; Melo, T. C.; Comenale, G.; Assaf, S. L. M. R.; Beçak, W.; Stocco, R. C.
2013-01-01
Bovine papillomavirus (BPV) is recognized as a causal agent of benign and malignant tumors in cattle. Thirteen types of BPV are currently characterized and classified into three distinct genera, associated with different pathological outcomes. The described BPV types as well as other putative ones have been demonstrated by molecular biology methods, mainly by the employment of degenerated PCR primers. Specifically, divergences in the nucleotide sequence of the L1 gene are useful for the identification and classification of new papillomavirus types. On the present work, a method based on the PCR-RFLP technique and DNA sequencing was evaluated as a screening tool, allowing for the detection of two relatively rare types of BPV in lesions samples from a six-year-old Holstein dairy cow, chronically affected with cutaneous papillomatosis. These findings point to the dissemination of BPVs with unclear pathogenic potential, since two relatively rare, new described BPV types, which were first characterized in Japan, were also detected in Brazil. PMID:23865043
Liao, Xiaolei; Zhao, Juanjuan; Jiao, Cheng; Lei, Lei; Qiang, Yan; Cui, Qiang
2016-01-01
Background Lung parenchyma segmentation is often performed as an important pre-processing step in the computer-aided diagnosis of lung nodules based on CT image sequences. However, existing lung parenchyma image segmentation methods cannot fully segment all lung parenchyma images and have a slow processing speed, particularly for images in the top and bottom of the lung and the images that contain lung nodules. Method Our proposed method first uses the position of the lung parenchyma image features to obtain lung parenchyma ROI image sequences. A gradient and sequential linear iterative clustering algorithm (GSLIC) for sequence image segmentation is then proposed to segment the ROI image sequences and obtain superpixel samples. The SGNF, which is optimized by a genetic algorithm (GA), is then utilized for superpixel clustering. Finally, the grey and geometric features of the superpixel samples are used to identify and segment all of the lung parenchyma image sequences. Results Our proposed method achieves higher segmentation precision and greater accuracy in less time. It has an average processing time of 42.21 seconds for each dataset and an average volume pixel overlap ratio of 92.22 ± 4.02% for four types of lung parenchyma image sequences. PMID:27532214
Gurtler, Volker; Grando, Danilla; Mayall, Barrie C; Wang, Jenny; Ghaly-Derias, Shahbano
2012-09-01
In order to develop a typing and identification method for van gene containing Enterococcus faecium, two multiplex PCR reactions were developed for use in HRM-PCR (High Resolution Melt-PCR): (i) vanA, vanB, vanC, vanC23 to detect van genes from different Enterococcus species; (ii) ISR (intergenic spacer region between the 16S and 23S rRNA genes) to detect all Enterococcus species and obtain species and isolate specific HRM curves. To test and validate the method three groups of isolates were tested: (i) 1672 Enterococcus species isolates from January 2009 to December 2009; (ii) 71 isolates previously identified and typed by PFGE (pulsed-field gel electrophoresis) and MLST (multi-locus sequence typing); and (iii) 18 of the isolates from (i) for which ISR sequencing was done. As well as successfully identifying 2 common genotypes by HRM from the Austin Hospital clinical isolates, this study analysed the sequences of all the vanB genes deposited in GenBank and developed a numerical classification scheme for the standardised naming of these vanB genotypes. The identification of Enterococcus faecalis from E. faecium was reliable and stable using ISR PCR. The typing of E. faecium by ISR PCR: (i) detected two variable peaks corresponding to different copy numbers of insertion sequences I and II corresponding to peak I and II respectively; (ii) produced 7 melt profiles for E. faecium with variable copy numbers of sequences I and II; (iii) demonstrated stability and instability of peak heights with equal frequency within the patient sample (36.4±4.5 days and 38.6±5.8 days respectively for 192 patients); (iv) detected ISR-HRM types with as much discrimination as PFGE and more than MLST; and (v) detected ISR-HRM types that differentiated some isolates that were identical by PFGE and MLST. In conjunction with the rapid and accurate van genotyping method described here, this ISR-HRM typing and identification method can be used as a stable identification and typing method with predictable instability based on recombination and concerted evolution of the rrn operon that will complement existing typing methods. Crown Copyright © 2012. Published by Elsevier B.V. All rights reserved.
G-CNV: A GPU-Based Tool for Preparing Data to Detect CNVs with Read-Depth Methods.
Manconi, Andrea; Manca, Emanuele; Moscatelli, Marco; Gnocchi, Matteo; Orro, Alessandro; Armano, Giuliano; Milanesi, Luciano
2015-01-01
Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals.
Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren
2011-05-01
The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J
2017-06-20
In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
[Molecular typing methods for Pasteurella multocida-A review].
Peng, Zhong; Liang, Wan; Wu, Bin
2016-10-04
Pasteurella multocida is an important gram-negative pathogenic bacterium that could infect wide ranges of animals. Humans could also be infected by P. multocida via animal bite or scratching. Current typing methods for P. multocida include serological typing methods and molecular typing methods. Of them, serological typing methods are based on immunological assays, which are too complicated for clinical bacteriological studies. However, the molecular methods including multiple PCRs and multilocus sequence typing (MLST) methods are more suitable for bacteriological studies of P. multocida in clinic, with their simple operation, high efficiency and accurate detection compared to the traditional serological typing methods, they are therefore widely used. In the current review, we briefly describe the molecular typing methods for P. multocida. Our aim is to provide a knowledge-foundation for clinical bacteriological investigation especially the molecular investigation for P. multocida.
Abras, Alba; Gállego, Montserrat; Muñoz, Carmen; Juiz, Natalia A; Ramírez, Juan Carlos; Cura, Carolina I; Tebar, Silvia; Fernández-Arévalo, Anna; Pinazo, María-Jesús; de la Torre, Leonardo; Posada, Elizabeth; Navarro, Ferran; Espinal, Paula; Ballart, Cristina; Portús, Montserrat; Gascón, Joaquim; Schijman, Alejandro G
2017-04-01
Trypanosoma cruzi, the causative agent of Chagas disease, is divided into six Discrete Typing Units (DTUs): TcI-TcVI. We aimed to identify T. cruzi DTUs in Latin-American migrants in the Barcelona area (Spain) and to assess different molecular typing approaches for the characterization of T. cruzi genotypes. Seventy-five peripheral blood samples were analyzed by two real-time PCR methods (qPCR) based on satellite DNA (SatDNA) and kinetoplastid DNA (kDNA). The 20 samples testing positive in both methods, all belonging to Bolivian individuals, were submitted to DTU characterization using two PCR-based flowcharts: multiplex qPCR using TaqMan probes (MTq-PCR), and conventional PCR. These samples were also studied by sequencing the SatDNA and classified as type I (TcI/III), type II (TcII/IV) and type I/II hybrid (TcV/VI). Ten out of the 20 samples gave positive results in the flowcharts: TcV (5 samples), TcII/V/VI (3) and mixed infections by TcV plus TcII (1) and TcV plus TcII/VI (1). By SatDNA sequencing, we classified the 20 samples, 19 as type I/II and one as type I. The most frequent DTU identified by both flowcharts, and suggested by SatDNA sequencing in the remaining samples with low parasitic loads, TcV, is common in Bolivia and predominant in peripheral blood. The mixed infection by TcV-TcII was detected for the first time simultaneously in Bolivian migrants. PCR-based flowcharts are very useful to characterize DTUs during acute infection. SatDNA sequence analysis cannot discriminate T. cruzi populations at the level of a single DTU but it enabled us to increase the number of characterized cases in chronically infected patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei
2015-01-01
The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
Trembizki, Ella; Smith, Helen; Lahra, Monica M; Chen, Marcus; Donovan, Basil; Fairley, Christopher K; Guy, Rebecca; Kaldor, John; Regan, David; Ward, James; Nissen, Michael D; Sloots, Theo P; Whiley, David M
2014-06-01
Neisseria gonorrhoeae antimicrobial resistance (AMR) is a global problem heightened by emerging resistance to ceftriaxone. Appropriate molecular typing methods are important for understanding the emergence and spread of N. gonorrhoeae AMR. We report on the development, validation and testing of a Sequenom MassARRAY iPLEX method for multilocus sequence typing (MLST)-style genotyping of N. gonorrhoeae isolates. An iPLEX MassARRAY method (iPLEX14SNP) was developed targeting 14 informative gonococcal single nucleotide polymorphisms (SNPs) previously shown to predict MLST types. The method was initially validated using 24 N. gonorrhoeae control isolates and was then applied to 397 test isolates collected throughout Queensland, Australia in the first half of 2012. The iPLEX14SNP method provided 100% accuracy for the control isolates, correctly identifying all 14 SNPs for all 24 isolates (336/336). For the 397 test isolates, the iPLEX14SNP assigned results for 5461 of the possible 5558 SNPs (SNP call rate 98.25%), with complete 14 SNP profiles obtained for 364 isolates. Based on the complete SNP profile data, there were 49 different sequence types identified in Queensland, with 11 of the 49 SNP profiles accounting for the majority (n = 280; 77%) of isolates. AMR was dominated by several geographically clustered sequence types. Using the iPLEX14SNP method, up to 384 isolates could be tested within 1 working day for less than Aus$10 per isolate. The iPLEX14SNP offers an accurate and high-throughput method for the MLST-style genotyping of N. gonorrhoeae and may prove particularly useful for large-scale studies investigating the emergence and spread of gonococcal AMR. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Evolution of Strain Typing in the Mycobacterium tuberculosis Complex.
Merker, Matthias; Kohl, Thomas A; Niemann, Stefan; Supply, Philip
2017-01-01
Tuberculosis (TB) is a contagious disease with a complex epidemiology. Therefore, molecular typing (genotyping) of Mycobacterium tuberculosis complex (MTBC) strains is of primary importance to effectively guide outbreak investigations, define transmission dynamics and assist global epidemiological surveillance of the disease. Large-scale genotyping is also needed to get better insights into the biological diversity and the evolution of the pathogen. Thanks to its shorter turnaround and simple numerical nomenclature system, mycobacterial interspersed repetitive unit-variable-number tandem repeat (MIRU-VNTR) typing, based on 24 standardized plus 4 hypervariable loci, optionally combined with spoligotyping, has replaced IS6110 DNA fingerprinting over the last decade as a gold standard among classical strain typing methods for many applications. With the continuous progress and decreasing costs of next-generation sequencing (NGS) technologies, typing based on whole genome sequencing (WGS) is now increasingly performed for near complete exploitation of the available genetic information. However, some important challenges remain such as the lack of standardization of WGS analysis pipelines, the need of databases for sharing WGS data at a global level, and a better understanding of the relevant genomic distances for defining clusters of recent TB transmission in different epidemiological contexts. This chapter provides an overview of the evolution of genotyping methods over the last three decades, which culminated with the development of WGS-based methods. It addresses the relative advantages and limitations of these techniques, indicates current challenges and potential directions for facilitating standardization of WGS-based typing, and provides suggestions on what method to use depending on the specific research question.
Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F.; Zhang, Qiuheng
2016-01-01
Background Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Methods Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3’ UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Results Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Conclusion Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation. PMID:27798706
Identifying disease polymorphisms from case-control genetic association data.
Park, L
2010-12-01
In case-control association studies, it is typical to observe several associated polymorphisms in a gene region. Often the most significantly associated polymorphism is considered to be the disease polymorphism; however, it is not clear whether it is the disease polymorphism or there is more than one disease polymorphism in the gene region. Currently, there is no method that can handle these problems based on the linkage disequilibrium (LD) relationship between polymorphisms. To distinguish real disease polymorphisms from markers in LD, a method that can detect disease polymorphisms in a gene region has been developed. Relying on the LD between polymorphisms in controls, the proposed method utilizes model-based likelihood ratio tests to find disease polymorphisms. This method shows reliable Type I and Type II error rates when sample sizes are large enough, and works better with re-sequenced data. Applying this method to fine mapping using re-sequencing or dense genotyping data would provide important information regarding the genetic architecture of complex traits.
Gherardi, Giovanni; Creti, Roberta; Pompilio, Arianna; Di Bonaventura, Giovanni
2015-03-01
Typing of bacterial isolates has been used for decades to study local outbreaks as well as in national and international surveillances for monitoring newly emerging resistant clones. Despite being recognized as a nosocomial pathogen, the precise modes of transmission of Stenotrophomonas maltophilia in health care settings are unknown. Due to the high genetic diversity observed among S. maltophilia clinical isolates, the typing results might be better interpreted if also environmental strains were included. This could help to identify preventative measures to be designed and implemented for decreasing the possibility of outbreaks and nosocomial infections. In this review, we attempt to provide an overview on the most common typing methods used for clinical epidemiology of S. maltophilia strains, such as PCR-based fingerprinting analyses, pulsed-field gel electrophoresis, multilocus variable number tandem repeat analysis, and multilocus sequence type. Application of the proteomic-based mass spectrometry by matrix-assisted laser desorption ionization-time of flight is also described. Improvements of typing methods already in use have to be achieved to facilitate S. maltophilia infection control at any level. In the near future, when novel Web-based platforms for rapid data processing and analysis will be available, whole genome sequencing technologies will likely become a highly powerful tool for outbreak investigations and surveillance studies in routine clinical practices. Copyright © 2015 Elsevier Inc. All rights reserved.
Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M
2012-01-01
Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.
Shaukat, Shahzad; Angez, Mehar; Alam, Muhammad Masroor; Jebbink, Maarten F; Deijs, Martin; Canuti, Marta; Sharif, Salmaan; de Vries, Michel; Khurshid, Adnan; Mahmood, Tariq; van der Hoek, Lia; Zaidi, Syed Sohail Zahoor
2014-08-12
The use of sequence independent methods combined with next generation sequencing for identification purposes in clinical samples appears promising and exciting results have been achieved to understand unexplained infections. One sequence independent method, Virus Discovery based on cDNA Amplified Fragment Length Polymorphism (VIDISCA) is capable of identifying viruses that would have remained unidentified in standard diagnostics or cell cultures. VIDISCA is normally combined with next generation sequencing, however, we set up a simplified VIDISCA which can be used in case next generation sequencing is not possible. Stool samples of 10 patients with unexplained acute flaccid paralysis showing cytopathic effect in rhabdomyosarcoma cells and/or mouse cells were used to test the efficiency of this method. To further characterize the viruses, VIDISCA-positive samples were amplified and sequenced with gene specific primers. Simplified VIDISCA detected seven viruses (70%) and the proportion of eukaryotic viral sequences from each sample ranged from 8.3 to 45.8%. Human enterovirus EV-B97, EV-B100, echovirus-9 and echovirus-21, human parechovirus type-3, human astrovirus probably a type-3/5 recombinant, and tetnovirus-1 were identified. Phylogenetic analysis based on the VP1 region demonstrated that the human enteroviruses are more divergent isolates circulating in the community. Our data support that a simplified VIDISCA protocol can efficiently identify unrecognized viruses grown in cell culture with low cost, limited time without need of advanced technical expertise. Also complex data interpretation is avoided thus the method can be used as a powerful diagnostic tool in limited resources. Redesigning the routine diagnostics might lead to additional detection of previously undiagnosed viruses in clinical samples of patients.
Zhang, Yun; Baheti, Saurabh; Sun, Zhifu
2018-05-01
High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.
High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.
Harrison, Lucas B; Hanson, Nancy D
2017-06-01
Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.
Benchmarking of Methods for Genomic Taxonomy
Larsen, Mette V.; Cosentino, Salvatore; Lukjancenko, Oksana; ...
2014-02-26
One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is—that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In this paper, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Typemore » that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. Finally, the KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.« less
A glow of HLA typing in organ transplantation
2013-01-01
The transplant of organs and tissues is one of the greatest curative achievements of this century. In organ transplantation, the adaptive immunity is considered the main response exerted to the transplanted tissue, since the main goal of the immune response is the MHC (major histocompatibility complex) molecules expressed on the surface of donor cells. Cell surface molecules that induce an antigenic stimulus cause the rejection immune response to grafted tissue or organ. A wide variety of transplantation antigens have been described, including the major histocompatibility molecules, minor histocompatibility antigens, ABO blood group antigens and endothelial cell antigens. The sensitization to MHC antigens may be caused by transfusions, pregnancy, or failed previous grafts leading to development of anti-human leukocyte antigen (HLA) antibodies that are important factor responsible for graft rejection in solid organ transplantation and play a role in post-transfusion complication Anti-HLA Abs may be present in healthy individuals. Methods for HLA typing are described, including serological methods, molecular techniques of sequence-specific priming (SSP), sequence-specific oligonucleotide probing (SSOP), Sequence based typing (SBT) and reference strand-based conformation analysis (RSCA) method. Problems with organ transplantation are reservoir of organs and immune suppressive treatments that used to decrease rate of rejection with less side effect and complications. PMID:23432791
Shimizu, Eri; Kato, Hisashi; Nakagawa, Yuki; Kodama, Takashi; Futo, Satoshi; Minegishi, Yasutaka; Watanabe, Takahiro; Akiyama, Hiroshi; Teshima, Reiko; Furui, Satoshi; Hino, Akihiro; Kitta, Kazumi
2008-07-23
A novel type of quantitative competitive polymerase chain reaction (QC-PCR) system for the detection and quantification of the Roundup Ready soybean (RRS) was developed. This system was designed based on the advantage of a fully validated real-time PCR method used for the quantification of RRS in Japan. A plasmid was constructed as a competitor plasmid for the detection and quantification of genetically modified soy, RRS. The plasmid contained the construct-specific sequence of RRS and the taxon-specific sequence of lectin1 (Le1), and both had 21 bp oligonucleotide insertion in the sequences. The plasmid DNA was used as a reference molecule instead of ground seeds, which enabled us to precisely and stably adjust the copy number of targets. The present study demonstrated that the novel plasmid-based QC-PCR method could be a simple and feasible alternative to the real-time PCR method used for the quantification of genetically modified organism contents.
Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl
2010-01-01
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC = 0.50, Qtotal = 82.1%, sensitivity = 75.6%, PPV = 68.8% and AUC = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. Conclusion The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences. PMID:21152409
Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl
2010-11-30
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.
Ogrodzki, Pauline; Forsythe, Stephen J.
2017-01-01
The Cronobacter genus is composed of seven species, within which a number of pathovars have been described. The most notable infections by Cronobacter spp. are of infants through the consumption of contaminated infant formula. The description of the genus has greatly improved in recent years through DNA sequencing techniques, and this has led to a robust means of identification. However some species are highly clonal and this limits the ability to discriminate between unrelated strains by some methods of genotyping. This article updates the application of three genotyping methods across the Cronobacter genus. The three genotyping methods were multilocus sequence typing (MLST), capsular profiling of the K-antigen and colanic acid (CA) biosynthesis regions, and CRISPR-cas array profiling. A total of 1654 MLST profiled and 286 whole genome sequenced strains, available by open access at the PubMLST Cronobacter database, were used this analysis. The predominance of C. sakazakii and C. malonaticus in clinical infections was confirmed. The majority of clinical strains being in the C. sakazakii clonal complexes (CC) 1 and 4, sequence types (ST) 8 and 12 and C. malonaticus ST7. The capsular profile K2:CA2, previously proposed as being strongly associated with C. sakazakii and C. malonaticus isolates from severe neonatal infections, was also found in C. turicensis, C. dublinensis and C. universalis. The majority of CRISPR-cas types across the genus was the I-E (Ecoli) type. Some strains of C. dublinensis and C. muytjensii encoded the I-F (Ypseudo) type, and others lacked the cas gene loci. The significance of the expanding profiling will be of benefit to researchers as well as governmental and industrial risk assessors. PMID:29033918
Suzuki, Y; Matsushita, S; Kubota, H; Kobayashi, M; Murauchi, K; Higuchi, Y; Kato, R; Hirai, A; Sadamasu, K
2016-09-01
Staphylocoagulase, an extracellular protein secreted by Staphylococcus aureus, has been used as an epidemiological marker. At least 12 serotypes and 24 genotypes subdivided on the basis of nucleotide sequence have been reported to date. In this study, we identified a novel staphylocoagulase nucleotide sequence, coa310, from staphylococcal food poisoning isolates that had the ability to coagulate plasma, but could not be typed using the conventional method. The protein encoded by coa310 contained the six fundamental conserved domains of staphylocoagulase. The full-length nucleotide sequence of coa310 shared the highest similarity (77·5%) with that of staphylocoagulase-type (SCT) XIa. The sequence of the D1 region, which would be responsible for the determination of SCT, shared the highest similarity (91·8%) with that of SCT XIa. These results suggest that coa310 is a novel variant of SCT XI. Moreover, we demonstrated that coa310 encodes a functioning coagulase, by confirming the coagulating activity of the recombinant protein expressed from coa310. This is the first study to directly demonstrate that Coa310, a putative SCT XI, has coagulating activity. These findings may be useful for the improvement of the staphylocoagulase-typing method, including serotyping and genotyping. This is the first study to identify a novel variant of staphylocoagulase type XI based on its nucleotide sequence and to demonstrate coagulating activity in the variant using a recombinant protein. Elucidation of the variety of staphylocoagulases will provide suggestions for further improvement of the staphylocoagulase-typing method and contribute to our understanding of the epidemiologic characterization of Staphylococcus aureus. © 2016 The Society for Applied Microbiology.
Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.
Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre
2012-01-01
Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.
Minim typing--a rapid and low cost MLST based typing tool for Klebsiella pneumoniae.
Andersson, Patiyan; Tong, Steven Y C; Bell, Jan M; Turnidge, John D; Giffard, Philip M
2012-01-01
Here we report a single nucleotide polymorphism (SNP) based genotyping method for Klebsiella pneumoniae utilising high-resolution melting (HRM) analysis of fragments within the multilocus sequence typing (MLST) loci. The approach is termed mini-MLST or Minim typing and it has previously been applied to Streptococcus pyogenes, Staphylococcus aureus and Enterococcus faecium. Six SNPs were derived from concatenated MLST sequences on the basis of maximisation of the Simpsons Index of Diversity (D). DNA fragments incorporating these SNPs and predicted to be suitable for HRM analysis were designed. Using the assumption that HRM alleles are defined by G+C content, Minim typing using six fragments was predicted to provide a D = 0.979 against known STs. The method was tested against 202 K. pneumoniae using a blinded approach in which the MLST analyses were performed after the HRM analyses. The HRM-based alleles were indeed in accordance with G+C content, and the Minim typing identified known STs and flagged new STs. The tonB MLST locus was determined to be very diverse, and the two Minim fragments located herein contribute greatly to the resolving power. However these fragments are refractory to amplification in a minority of isolates. Therefore, we assessed the performance of two additional formats: one using only the four fragments located outside the tonB gene (D = 0.929), and the other using HRM data from these four fragments in conjunction with sequencing of the tonB MLST fragment (D = 0.995). The HRM assays were developed on the Rotorgene 6000, and the method was shown to also be robust on the LightCycler 480, allowing a 384-well high through-put format. The assay provides rapid, robust and low-cost typing with fully portable results that can directly be related to current MLST data. Minim typing in combination with molecular screening for antibiotic resistance markers can be a powerful surveillance tool kit.
Minim Typing – A Rapid and Low Cost MLST Based Typing Tool for Klebsiella pneumoniae
Andersson, Patiyan; Tong, Steven Y. C.; Bell, Jan M.; Turnidge, John D.; Giffard, Philip M.
2012-01-01
Here we report a single nucleotide polymorphism (SNP) based genotyping method for Klebsiella pneumoniae utilising high-resolution melting (HRM) analysis of fragments within the multilocus sequence typing (MLST) loci. The approach is termed mini-MLST or Minim typing and it has previously been applied to Streptococcus pyogenes, Staphylococcus aureus and Enterococcus faecium. Six SNPs were derived from concatenated MLST sequences on the basis of maximisation of the Simpsons Index of Diversity (D). DNA fragments incorporating these SNPs and predicted to be suitable for HRM analysis were designed. Using the assumption that HRM alleles are defined by G+C content, Minim typing using six fragments was predicted to provide a D = 0.979 against known STs. The method was tested against 202 K. pneumoniae using a blinded approach in which the MLST analyses were performed after the HRM analyses. The HRM-based alleles were indeed in accordance with G+C content, and the Minim typing identified known STs and flagged new STs. The tonB MLST locus was determined to be very diverse, and the two Minim fragments located herein contribute greatly to the resolving power. However these fragments are refractory to amplification in a minority of isolates. Therefore, we assessed the performance of two additional formats: one using only the four fragments located outside the tonB gene (D = 0.929), and the other using HRM data from these four fragments in conjunction with sequencing of the tonB MLST fragment (D = 0.995). The HRM assays were developed on the Rotorgene 6000, and the method was shown to also be robust on the LightCycler 480, allowing a 384-well high through-put format. The assay provides rapid, robust and low-cost typing with fully portable results that can directly be related to current MLST data. Minim typing in combination with molecular screening for antibiotic resistance markers can be a powerful surveillance tool kit. PMID:22428067
Reiman, Anne; Pandey, Sarojini; Lloyd, Kate L; Dyer, Nigel; Khan, Mike; Crockard, Martin; Latten, Mark J; Watson, Tracey L; Cree, Ian A; Grammatopoulos, Dimitris K
2016-11-01
Background Detection of disease-associated mutations in patients with familial hypercholesterolaemia is crucial for early interventions to reduce risk of cardiovascular disease. Screening for these mutations represents a methodological challenge since more than 1200 different causal mutations in the low-density lipoprotein receptor has been identified. A number of methodological approaches have been developed for screening by clinical diagnostic laboratories. Methods Using primers targeting, the low-density lipoprotein receptor, apolipoprotein B, and proprotein convertase subtilisin/kexin type 9, we developed a novel Ion Torrent-based targeted re-sequencing method. We validated this in a West Midlands-UK small cohort of 58 patients screened in parallel with other mutation-targeting methods, such as multiplex polymerase chain reaction (Elucigene FH20), oligonucleotide arrays (Randox familial hypercholesterolaemia array) or the Illumina next-generation sequencing platform. Results In this small cohort, the next-generation sequencing method achieved excellent analytical performance characteristics and showed 100% and 89% concordance with the Randox array and the Elucigene FH20 assay. Investigation of the discrepant results identified two cases of mutation misclassification of the Elucigene FH20 multiplex polymerase chain reaction assay. A number of novel mutations not previously reported were also identified by the next-generation sequencing method. Conclusions Ion Torrent-based next-generation sequencing can deliver a suitable alternative for the molecular investigation of familial hypercholesterolaemia patients, especially when comprehensive mutation screening for rare or unknown mutations is required.
Dan, Tong; Liu, Wenjun; Song, Yuqin; Xu, Haiyan; Menghe, Bilige; Zhang, Heping; Sun, Zhihong
2015-05-20
Lactobacillus fermentum is economically important in the production and preservation of fermented foods. A repeatable and discriminative typing method was devised to characterize L. fermentum at the molecular level. The multilocus sequence typing (MLST) scheme developed was based on analysis of the internal sequence of 11 housekeeping gene fragments (clpX, dnaA, dnaK, groEL, murC, murE, pepX, pyrG, recA, rpoB, and uvrC). MLST analysis of 203 isolates of L. fermentum from Mongolia and seven provinces/ autonomous regions in China identified 57 sequence types (ST), 27 of which were represented by only a single isolate, indicating high genetic diversity. Phylogenetic analyses based on the sequence of the 11 housekeeping gene fragments indicated that the L. fermentum isolates analyzed belonged to two major groups. A standardized index of association (I A (S)) indicated a weak clonal population structure in L. fermentum. Split decomposition analysis indicated that recombination played an important role in generating the genetic diversity observed in L. fermentum. The results from the minimum spanning tree strongly suggested that evolution of L. fermentum STs was not correlated with geography or food-type. The MLST scheme developed will be valuable for further studies on the evolution and population structure of L. fermentum isolates used in food products.
High speed nucleic acid sequencing
Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY
2011-05-17
The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.
McIntyre, Chloe L.; Knowles, Nick J.
2013-01-01
Human rhinoviruses (HRVs) frequently cause mild upper respiratory tract infections and more severe disease manifestations such as bronchiolitis and asthma exacerbations. HRV is classified into three species within the genus Enterovirus of the family Picornaviridae. HRV species A and B contain 75 and 25 serotypes identified by cross-neutralization assays, although the use of such assays for routine HRV typing is hampered by the large number of serotypes, replacement of virus isolation by molecular methods in HRV diagnosis and the poor or absent replication of HRV species C in cell culture. To address these problems, we propose an alternative, genotypic classification of HRV-based genetic relatedness analogous to that used for enteroviruses. Nucleotide distances between 384 complete VP1 sequences of currently assigned HRV (sero)types identified divergence thresholds of 13, 12 and 13 % for species A, B and C, respectively, that divided inter- and intra-type comparisons. These were paralleled by 10, 9.5 and 10 % thresholds in the larger dataset of >3800 VP4 region sequences. Assignments based on VP1 sequences led to minor revisions of existing type designations (such as the reclassification of serotype pairs, e.g. A8/A95 and A29/A44, as single serotypes) and the designation of new HRV types A101–106, B101–103 and C34–C51. A protocol for assignment and numbering of new HRV types using VP1 sequences and the restriction of VP4 sequence comparisons to type identification and provisional type assignments is proposed. Genotypic assignment and identification of HRV types will be of considerable value in the future investigation of type-associated differences in disease outcomes, transmission and epidemiology. PMID:23677786
Protein sequences clustering of herpes virus by using Tribe Markov clustering (Tribe-MCL)
NASA Astrophysics Data System (ADS)
Bustamam, A.; Siswantining, T.; Febriyani, N. L.; Novitasari, I. D.; Cahyaningrum, R. D.
2017-07-01
The herpes virus can be found anywhere and one of the important characteristics is its ability to cause acute and chronic infection at certain times so as a result of the infection allows severe complications occurred. The herpes virus is composed of DNA containing protein and wrapped by glycoproteins. In this work, the Herpes viruses family is classified and analyzed by clustering their protein-sequence using Tribe Markov Clustering (Tribe-MCL) algorithm. Tribe-MCL is an efficient clustering method based on the theory of Markov chains, to classify protein families from protein sequences using pre-computed sequence similarity information. We implement the Tribe-MCL algorithm using an open source program of R. We select 24 protein sequences of Herpes virus obtained from NCBI database. The dataset consists of three types of glycoprotein B, F, and H. Each type has eight herpes virus that infected humans. Based on our simulation using different inflation factor r=1.5, 2, 3 we find a various number of the clusters results. The greater the inflation factor the greater the number of their clusters. Each protein will grouped together in the same type of protein.
Yan, Qiongqiong; Fanning, Séamus
2015-01-01
Cronobacter species are emerging opportunistic food-borne pathogens, which consists of seven species, including C. sakazakii, C. malonaticus, C. muytjensii, C. turicensis, C. dublinensis, C. universalis, and C. condimenti. The organism can cause severe clinical infections, including necrotizing enterocolitis, septicemia, and meningitis, predominately among neonates <4 weeks of age. Cronobacter species can be isolated from various foods and their surrounding environments; however, powdered infant formula (PIF) is the most frequently implicated food source linked with Cronobacter infection. This review aims to provide a summary of laboratory-based strategies that can be used to identify and trace Cronobacter species. The identification of Cronobacter species using conventional culture method and immuno-based detection protocols were first presented. The molecular detection and identification at genus-, and species-level along with molecular-based serogroup approaches are also described, followed by the molecular sub-typing methods, in particular pulsed-field gel electrophoresis and multi-locus sequence typing. Next generation sequence approaches, including whole genome sequencing, DNA microarray, and high-throughput whole-transcriptome sequencing, are also highlighted. Appropriate application of these strategies would contribute to reduce the risk of Cronobacter contamination in PIF and production environments, thereby improving food safety and protecting public health. PMID:26000266
Liu, Wenjun; Yu, Jie; Sun, Zhihong; Song, Yuqin; Wang, Xueni; Wang, Hongmei; Wuren, Tuoya; Zha, Musu; Menghe, Bilige; Heping, Zhang
2016-01-01
Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus) is well known for its worldwide application in yogurt production. Flavor production and acid producing are considered as the most important characteristics for starter culture screening. To our knowledge this is the first study applying functional gene sequence multilocus sequence typing technology to predict the fermentation and flavor-producing characteristics of yogurt-producing bacteria. In the present study, phenotypic characteristics of 35 L. bulgaricus strains were quantified during the fermentation of milk to yogurt and during its subsequent storage; these included fermentation time, acidification rate, pH, titratable acidity, and flavor characteristics (acetaldehyde concentration). Furthermore, multilocus sequence typing analysis of 7 functional genes associated with fermentation time, acid production, and flavor formation was done to elucidate the phylogeny and genetic evolution of the same L. bulgaricus isolates. The results showed that strains significantly differed in fermentation time, acidification rate, and acetaldehyde production. Combining functional gene sequence analysis with phenotypic characteristics demonstrated that groups of strains established using genotype data were consistent with groups identified based on their phenotypic traits. This study has established an efficient and rapid molecular genotyping method to identify strains with good fermentation traits; this has the potential to replace time-consuming conventional methods based on direct measurement of phenotypic traits. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
UV-Visible Spectroscopy-Based Quantification of Unlabeled DNA Bound to Gold Nanoparticles.
Baldock, Brandi L; Hutchison, James E
2016-12-20
DNA-functionalized gold nanoparticles have been increasingly applied as sensitive and selective analytical probes and biosensors. The DNA ligands bound to a nanoparticle dictate its reactivity, making it essential to know the type and number of DNA strands bound to the nanoparticle surface. Existing methods used to determine the number of DNA strands per gold nanoparticle (AuNP) require that the sequences be fluorophore-labeled, which may affect the DNA surface coverage and reactivity of the nanoparticle and/or require specialized equipment and other fluorophore-containing reagents. We report a UV-visible-based method to conveniently and inexpensively determine the number of DNA strands attached to AuNPs of different core sizes. When this method is used in tandem with a fluorescence dye assay, it is possible to determine the ratio of two unlabeled sequences of different lengths bound to AuNPs. Two sizes of citrate-stabilized AuNPs (5 and 12 nm) were functionalized with mixtures of short (5 base) and long (32 base) disulfide-terminated DNA sequences, and the ratios of sequences bound to the AuNPs were determined using the new method. The long DNA sequence was present as a lower proportion of the ligand shell than in the ligand exchange mixture, suggesting it had a lower propensity to bind the AuNPs than the short DNA sequence. The ratio of DNA sequences bound to the AuNPs was not the same for the large and small AuNPs, which suggests that the radius of curvature had a significant influence on the assembly of DNA strands onto the AuNPs.
de Souza Godinho, Fernanda Marques; Bock, Hugo; Gheno, Tailise Conte; Saraiva-Pereira, Maria Luiza
2012-12-01
Spinal muscular atrophy (SMA) is an autosomal recessive inherited disorder caused by alterations in the survival motor neuron I (SMN1) gene. SMA patients are classified as type I-IV based on severity of symptoms and age of onset. About 95% of SMA cases are caused by the homozygous absence of SMN1 due to gene deletion or conversion into SMN2. PCR-based methods have been widely used in genetic testing for SMA. In this work, we introduce a new approach based on TaqMan(®)real-time PCR for research and diagnostic settings. DNA samples from 100 individuals with clinical signs and symptoms suggestive of SMA were analyzed. Mutant DNA samples as well as controls were confirmed by DNA sequencing. We detected 58 SMA cases (58.0%) by showing deletion of SMN1 exon 7. Considering clinical information available from 56 of them, the patient distribution was 26 (46.4%) SMA type I, 16 (28.6%) SMA type II and 14 (25.0%) SMA type III. Results generated by the new method was confirmed by PCR-RFLP and by DNA sequencing when required. In conclusion, a protocol based on real-time PCR was shown to be effective and specific for molecular analysis of SMA patients.
Mapping Base Modifications in DNA by Transverse-Current Sequencing
NASA Astrophysics Data System (ADS)
Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.
2018-02-01
Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.
Qin, T; Zhou, H; Ren, H; Shi, W; Jin, H; Jiang, X; Xu, Y; Zhou, M; Li, J; Wang, J; Shao, Z; Xu, X
2016-07-01
Legionnaires' disease (LD) is a globally distributed systemic infectious disease. The burden of LD in many regions is still unclear, especially in Asian countries including China. A survey of Legionella infection using real-time PCR and nested sequence-based typing (SBT) was performed in two hospitals in Shanghai, China. A total of 265 bronchoalveolar lavage fluid (BALF) specimens were collected from hospital A between January 2012 and December 2013, and 359 sputum specimens were collected from hospital B throughout 2012. A total of 71 specimens were positive for Legionella according to real-time PCR focusing on the 5S rRNA gene. Seventy of these specimens were identified as Legionella pneumophila as a result of real-time PCR amplification of the dotA gene. Results of nested SBT revealed high genetic polymorphism in these L. pneumophila and ST1 was the predominant sequence type. These data revealed that the burden of LD in China is much greater than that recognized previously, and real-time PCR may be a suitable monitoring technology for LD in large sample surveys in regions lacking the economic and technical resources to perform other methods, such as urinary antigen tests and culture methods.
Goldberg, Tony L; Gillespie, Thomas R; Singer, Randall S
2006-09-01
Repetitive-element PCR (rep-PCR) is a method for genotyping bacteria based on the selective amplification of repetitive genetic elements dispersed throughout bacterial chromosomes. The method has great potential for large-scale epidemiological studies because of its speed and simplicity; however, objective guidelines for inferring relationships among bacterial isolates from rep-PCR data are lacking. We used multilocus sequence typing (MLST) as a "gold standard" to optimize the analytical parameters for inferring relationships among Escherichia coli isolates from rep-PCR data. We chose 12 isolates from a large database to represent a wide range of pairwise genetic distances, based on the initial evaluation of their rep-PCR fingerprints. We conducted MLST with these same isolates and systematically varied the analytical parameters to maximize the correspondence between the relationships inferred from rep-PCR and those inferred from MLST. Methods that compared the shapes of densitometric profiles ("curve-based" methods) yielded consistently higher correspondence values between data types than did methods that calculated indices of similarity based on shared and different bands (maximum correspondences of 84.5% and 80.3%, respectively). Curve-based methods were also markedly more robust in accommodating variations in user-specified analytical parameter values than were "band-sharing coefficient" methods, and they enhanced the reproducibility of rep-PCR. Phylogenetic analyses of rep-PCR data yielded trees with high topological correspondence to trees based on MLST and high statistical support for major clades. These results indicate that rep-PCR yields accurate information for inferring relationships among E. coli isolates and that accuracy can be enhanced with the use of analytical methods that consider the shapes of densitometric profiles.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
High-resolution typing of Chlamydia trachomatis: epidemiological and clinical uses.
de Vries, Henry J C; Schim van der Loeff, Maarten F; Bruisten, Sylvia M
2015-02-01
A state-of-the-art overview of molecular Chlamydia trachomatis typing methods that are used for routine diagnostics and scientific studies. Molecular epidemiology uses high-resolution typing techniques such as multilocus sequence typing, multilocus variable number of tandem repeats analysis, and whole-genome sequencing to identify strains based on their DNA sequence. These data can be used for cluster, network and phylogenetic analyses, and are used to unveil transmission networks, risk groups, and evolutionary pathways. High-resolution typing of C. trachomatis strains is applied to monitor treatment efficacy and re-infections, and to study the recent emergence of lymphogranuloma venereum (LGV) amongst men who have sex with men in high-income countries. Chlamydia strain typing has clinical relevance in disease management, as LGV needs longer treatment than non-LGV C. trachomatis. It has also led to the discovery of a new variant Chlamydia strain in Sweden, which was not detected by some commercial C. trachomatis diagnostic platforms. After a brief history and comparison of the various Chlamydia typing methods, the applications of the current techniques are described and future endeavors to extend scientific understanding are formulated. High-resolution typing will likely help to further unravel the pathophysiological mechanisms behind the wide clinical spectrum of chlamydial disease.
Kuhn, Alexandre; Ong, Yao Min; Quake, Stephen R; Burkholder, William F
2015-07-08
Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed. We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate. This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.
Maluping, R P; Ravelo, C; Lavilla-Pitogo, C R; Krovacek, K; Romalde, J L
2005-01-01
The main aim of the present study was to use three PCR-based techniques for the analysis of genetic variability among Vibrio parahaemolyticus strains isolated from the Philippines. Seventeen strains of V. parahaemolyticus isolated from shrimps (Penaeus monodon) and from the environments where these shrimps are being cultivated were analysed by random amplified polymorphic DNA PCR (RAPD-PCR), enterobacterial repetitive intergenic consensus sequence PCR (ERIC-PCR) and repetitive extragenic palindromic PCR (REP-PCR). The results of this work have demonstrated genetic variability within the V. parahaemolyticus strains that were isolated from the Philippines. In addition, RAPD, ERIC and REP-PCR are suitable rapid typing methods for V. parahaemolyticus. All three methods have good discriminative ability and can be used as a rapid means of comparing V. parahaemolyticus strains for epidemiological investigation. Based on the results of this study, we could say that REP-PCR is inferior to RAPD and ERIC-PCR owing to the fact that it is less reproducible. Moreover, the REP-PCR analysis yielded a relatively small number of products. This may suggests that the REP sequences may not be widely distributed in the V. parahaemolyticus genome. Genetic variability within V. parahaemolyticus strains isolated in the Philippines has been demonstrated. The presence of ERIC and REP sequences in the genome of this bacterial species was confirmed. The RAPD, ERIC and REP-PCR techniques are useful methods for molecular typing of V. parahaemolyticus strains. To our knowledge this is the first study of this kind carried out on V. parahaemolyticus strains isolated from the Philippines.
Several Families of Sequences with Low Correlation and Large Linear Span
NASA Astrophysics Data System (ADS)
Zeng, Fanxin; Zhang, Zhenyu
In DS-CDMA systems and DS-UWB radios, low correlation of spreading sequences can greatly help to minimize multiple access interference (MAI) and large linear span of spreading sequences can reduce their predictability. In this letter, new sequence sets with low correlation and large linear span are proposed. Based on the construction Trm1[Trnm(αbt+γiαdt)]r for generating p-ary sequences of period pn-1, where n=2m, d=upm±v, b=u±v, γi∈GF(pn), and p is an arbitrary prime number, several methods to choose the parameter d are provided. The obtained sequences with family size pn are of four-valued, five-valued, six-valued or seven-valued correlation and the maximum nontrivial correlation value is (u+v-1)pm-1. The simulation by a computer shows that the linear span of the new sequences is larger than that of the sequences with Niho-type and Welch-type decimations, and similar to that of [10].
High resolution identity testing of inactivated poliovirus vaccines
Mee, Edward T.; Minor, Philip D.; Martin, Javier
2015-01-01
Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003
A simple method for MR elastography: a gradient-echo type multi-echo sequence.
Numano, Tomokazu; Mizuhara, Kazuyuki; Hata, Junichi; Washio, Toshikatsu; Homma, Kazuhiro
2015-01-01
To demonstrate the feasibility of a novel MR elastography (MRE) technique based on a conventional gradient-echo type multi-echo MR sequence which does not need additional bipolar magnetic field gradients (motion encoding gradient: MEG), yet is sensitive to vibration. In a gradient-echo type multi-echo MR sequence, several images are produced from each echo of the train with different echo times (TEs). If these echoes are synchronized with the vibration, each readout's gradient lobes achieve a MEG-like effect, and the later generated echo causes a greater MEG-like effect. The sequence was tested for the tissue-mimicking agarose gel phantoms and the psoas major muscles of healthy volunteers. It was confirmed that the readout gradient lobes caused an MEG-like effect and the later TE images had higher sensitivity to vibrations. The magnitude image of later generated echo suffered the T2 decay and the susceptibility artifacts, but the wave image and elastogram of later generated echo were unaffected by these effects. In in vivo experiments, this method was able to measure the mean shear modulus of the psoas major muscle. From the results of phantom experiments and volunteer studies, it was shown that this method has clinical application potential. Copyright © 2014 Elsevier Inc. All rights reserved.
Johnson, Lucas B; Gintner, Lucas P; Park, Sehoo; Snow, Christopher D
2015-08-01
Accuracy of current computational protein design (CPD) methods is limited by inherent approximations in energy potentials and sampling. These limitations are often used to qualitatively explain design failures; however, relatively few studies provide specific examples or quantitative details that can be used to improve future CPD methods. Expanding the design method to include a library of sequences provides data that is well suited for discriminating between stabilizing and destabilizing design elements. Using thermophilic endoglucanase E1 from Acidothermus cellulolyticus as a model enzyme, we computationally designed a sequence with 60 mutations. The design sequence was rationally divided into structural blocks and recombined with the wild-type sequence. Resulting chimeras were assessed for activity and thermostability. Surprisingly, unlike previous chimera libraries, regression analysis based on one- and two-body effects was not sufficient for predicting chimera stability. Analysis of molecular dynamics simulations proved helpful in distinguishing stabilizing and destabilizing mutations. Reverting to the wild-type amino acid at destabilized sites partially regained design stability, and introducing predicted stabilizing mutations in wild-type E1 significantly enhanced thermostability. The ability to isolate stabilizing and destabilizing elements in computational design offers an opportunity to interpret previous design failures and improve future CPD methods. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-01-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-08-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.
Li, Zhirong; Liu, Xiaolei; Zhao, Jianhong; Xu, Kaiyue; Tian, Tiantian; Yang, Jing; Qiang, Cuixin; Shi, Dongyan; Wei, Honglian; Sun, Suju; Cui, Qingqing; Li, Ruxin; Niu, Yanan; Huang, Bixing
2018-04-01
Clostridium difficile is the causative pathogen for antibiotic-related nosocomial diarrhea. For epidemiological study and identification of virulent clones, a new binary typing method was developed for C. difficile in this study. The usefulness of this newly developed optimized 10-loci binary typing method was compared with two widely used methods ribotyping and multilocus sequence typing (MLST) in 189 C. difficile samples. The binary typing, ribotyping and MLST typed the samples into 53 binary types (BTs), 26 ribotypes (RTs), and 33 MLST sequence types (STs), respectively. The typing ability of the binary method was better than that of either ribotyping or MLST expressed in Simpson Index (SI) at 0.937, 0.892 and 0.859, respectively. The ease of testing, portability and cost-effectiveness of the new binary typing would make it a useful typing alternative for outbreak investigations within healthcare facilities and epidemiological research. Copyright © 2018 Elsevier B.V. All rights reserved.
Hou, X-L; Cao, Q-Y; Jia, H-Y; Chen, Z
2008-07-01
Pathogens causing acute diarrhea include a large variety of species from Enterobacteriaceae and Vibrionaceae. A method based on pyrosequencing was used here to differentiate bacteria commonly associated with diarrhea in China; the method is targeted to a partial amplicon of the gyrB gene, which encodes the B subunit of DNA gyrase. Twenty-eight specific polymorphic positions were identified from sequence alignment of a large sequence dataset and targeted using 17 sequencing primers. Of 95 isolates tested, belonging to 13 species within 7 genera, most could be identified to the species level; O157 type could be differentiated from other E. coli types; Salmonella enterica subsp. enterica could be identified at the serotype level; the genus Shigella, except for S. boydii and S. dysenteriae, could also be identified. All these isolates were also subjected to conventional sequencing of a relatively long ( approximately1.2 kb) region of gyrB DNA; these results confirmed those with pyrosequencing. Twenty-two fecal samples were surveyed, the results of which were concordant with culture-based bacterial identification, and the pathogen detection limit with simulated stool specimens was 10(4) CFU/ml. DNA from different pathogens was also mixed to simulate a case of multibacterial infection, and the generated signals correlated well with the mix ratio. In summary, the gyrB-based pyrosequencing approach proved to have significant reliability and discriminatory power for enteropathogenic bacterial identification and provided a fast and effective method for clinical diagnosis.
Winnowing DNA for Rare Sequences: Highly Specific Sequence and Methylation Based Enrichment
Thompson, Jason D.; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre
2012-01-01
Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue. PMID:22355378
Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Yu, Yeisoo; Yang, Kiwoung; Choi, Beom-Soon; Koh, Hee-Jong; Waminal, Nomar Espinosa; Choi, Hong-Il; Kim, Nam-Hoon; Jang, Woojong; Park, Hyun-Seung; Lee, Jonghoon; Lee, Hyun Oh; Joh, Ho Jun; Lee, Hyeon Ju; Park, Jee Young; Perumal, Sampath; Jayakodi, Murukarthick; Lee, Yun Sun; Kim, Backki; Copetti, Dario; Kim, Soonok; Kim, Sunggil; Lim, Ki-Byung; Kim, Young-Dong; Lee, Jungho; Cho, Kwang-Su; Park, Beom-Seok; Wing, Rod A.; Yang, Tae-Jin
2015-01-01
Cytoplasmic chloroplast (cp) genomes and nuclear ribosomal DNA (nR) are the primary sequences used to understand plant diversity and evolution. We introduce a high-throughput method to simultaneously obtain complete cp and nR sequences using Illumina platform whole-genome sequence. We applied the method to 30 rice specimens belonging to nine Oryza species. Concurrent phylogenomic analysis using cp and nR of several of specimens of the same Oryza AA genome species provides insight into the evolution and domestication of cultivated rice, clarifying three ambiguous but important issues in the evolution of wild Oryza species. First, cp-based trees clearly classify each lineage but can be biased by inter-subspecies cross-hybridization events during speciation. Second, O. glumaepatula, a South American wild rice, includes two cytoplasm types, one of which is derived from a recent interspecies hybridization with O. longistminata. Third, the Australian O. rufipogan-type rice is a perennial form of O. meridionalis. PMID:26506948
Sun, Mingjun; Jing, Zhigang; Di, Dongdong; Yan, Hao; Zhang, Zhicheng; Xu, Quangang; Zhang, Xiyue; Wang, Xun; Ni, Bo; Sun, Xiangxiang; Yan, Chengxu; Yang, Zhen; Tian, Lili; Li, Jinping; Fan, Weixing
2017-01-01
Brucellosis is a worldwide zoonotic disease caused by Brucella spp. In China, brucellosis is recognized as a reemerging disease mainly caused by Brucella melitensis specie. To better understand the currently endemic B. melitensis strains in China, three Brucella genotyping methods were applied to 110 B. melitensis strains obtained in past several years. By MLVA genotyping, five MLVA-8 genotypes were identified, among which genotypes 42 (1-5-3-13-2-2-3-2) was recognized as the predominant genotype, while genotype 63 (1-5-3-13-2-3-3-2) and a novel genotype of 1-5-3-13-2-4-3-2 were second frequently observed. MLVA-16 discerned a total of 57 MLVA-16 genotypes among these Brucella strains, with 41 genotypes being firstly detected and the other 16 genotypes being previously reported. By BruMLSA21 typing, six sequence types (STs) were identified, among them ST8 is the most frequently seen in China while the other five STs were firstly detected and designated as ST137, ST138, ST139, ST140, and ST141 by international multilocus sequence typing database. Whole-genome sequence (WGS)-single-nucleotide polymorphism (SNP)-based typing and phylogenetic analysis resolved Chinese B. melitensis strains into five clusters, reflecting the existence of multiple lineages among these Chinese B. melitensis strains. In phylogeny, Chinese lineages are more closely related to strains collected from East Mediterranean and Middle East countries, such as Turkey, Kuwait, and Iraq. In the next few years, MLVA typing will certainly remain an important epidemiological tool for Brucella infection analysis, as it displays a high discriminatory ability and achieves result largely in agreement with WGS-SNP-based typing. However, WGS-SNP-based typing is found to be the most powerful and reliable method in discerning Brucella strains and will be popular used in the future.
Direct detection of a BRAF mutation in total RNA from melanoma cells using cantilever arrays
NASA Astrophysics Data System (ADS)
Huber, F.; Lang, H. P.; Backmann, N.; Rimoldi, D.; Gerber, Ch.
2013-02-01
Malignant melanoma, the deadliest form of skin cancer, is characterized by a predominant mutation in the BRAF gene. Drugs that target tumours carrying this mutation have recently entered the clinic. Accordingly, patients are routinely screened for mutations in this gene to determine whether they can benefit from this type of treatment. The current gold standard for mutation screening uses real-time polymerase chain reaction and sequencing methods. Here we show that an assay based on microcantilever arrays can detect the mutation nanomechanically without amplification in total RNA samples isolated from melanoma cells. The assay is based on a BRAF-specific oligonucleotide probe. We detected mutant BRAF at a concentration of 500 pM in a 50-fold excess of the wild-type sequence. The method was able to distinguish melanoma cells carrying the mutation from wild-type cells using as little as 20 ng µl-1 of RNA material, without prior PCR amplification and use of labels.
Garrido-Martín, Diego; Pazos, Florencio
2018-02-27
The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Differential correlation for sequencing data.
Siska, Charlotte; Kechris, Katerina
2017-01-19
Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple -omics studies.
Doitsidou, Maria; Jarriault, Sophie; Poole, Richard J.
2016-01-01
The use of next-generation sequencing (NGS) has revolutionized the way phenotypic traits are assigned to genes. In this review, we describe NGS-based methods for mapping a mutation and identifying its molecular identity, with an emphasis on applications in Caenorhabditis elegans. In addition to an overview of the general principles and concepts, we discuss the main methods, provide practical and conceptual pointers, and guide the reader in the types of bioinformatics analyses that are required. Owing to the speed and the plummeting costs of NGS-based methods, mapping and cloning a mutation of interest has become straightforward, quick, and relatively easy. Removing this bottleneck previously associated with forward genetic screens has significantly advanced the use of genetics to probe fundamental biological processes in an unbiased manner. PMID:27729495
Zhu, X Q; Gasser, R B
1998-06-01
In this study, we assessed single-strand conformation polymorphism (SSCP)-based approaches for their capacity to fingerprint sequence variation in ribosomal DNA (rDNA) of ascaridoid nematodes of veterinary and/or human health significance. The second internal transcribed spacer region (ITS-2) of rDNA was utilised as the target region because it is known to provide species-specific markers for this group of parasites. ITS-2 was amplified by PCR from genomic DNA derived from individual parasites and subjected to analysis. Direct SSCP analysis of amplicons from seven taxa (Toxocara vitulorum, Toxocara cati, Toxocara canis, Toxascaris leonina, Baylisascaris procyonis, Ascaris suum and Parascaris equorum) showed that the single-strand (ss) ITS-2 patterns produced allowed their unequivocal identification to species. While no variation in SSCP patterns was detected in the ITS-2 within four species for which multiple samples were available, the method allowed the direct display of four distinct sequence types of ITS-2 among individual worms of T. cati. Comparison of SSCP/sequencing with the methods of dideoxy fingerprinting (ddF) and restriction endonuclease fingerprinting (REF) revealed that also ddF allowed the definition of the four sequence types, whereas REF displayed three of four. The findings indicate the usefulness of the SSCP-based approaches for the identification of ascaridoid nematodes to species, the direct display of sequence variation in rDNA and the detection of population variation. The ability to fingerprint microheterogeneity in ITS-2 rDNA using such approaches also has implications for studying fundamental aspects relating to mutational change in rDNA.
Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus
2015-03-01
The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.
Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.
Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka
2014-02-01
In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.
A small test of a sequence-based typing method: definition of the B*1520 allele.
Domena, J D; Little, A M; Arnett, K L; Adams, E J; Marsh, S G; Parham, P
1994-10-01
Santamaria et al. (Human Immunology 1993 37: 39-50) describe a method of sequence-based typing (SBT) for HLA-A, B and C alleles said to give "unambiguous typing of any sample, heterozygous or homozygous, without requiring additional typing information". From SBT analysis, which involves determination of partial sequences of mixed alleles, these investigators reported that cell lines KT17 (HLA-B35,62) and OLGA (HLA-B62) from the reference panel of the 10th International Histocompatibility Workshop express novel variants of HLA-B15 (B1501-MN6) and HLA-B35 (B3501-MN7) respectively. To study further the novel alleles, we cloned and sequenced full-length HLA-B cDNA clones isolated from the KT17 and OLGA cell lines. We find that KT17 expresses B*3501, as assigned by SBT, and B*1501, the common allele encoding the B62 antigen. We were unable to confirm that KT17 expresses the novel B1501-MN6 variant identified by SBT. For OLGA our analysis confirms the partial sequences obtained by SBT. Thus OLGA expresses B*1501 and a novel HLA-B allele. The complete sequence of the latter shows it is a hybrid having exons 1 and 2 in common with B*1501 and other B15 subtypes and exons 3-7 in common with B*3501 and related molecules including B*5301 and B*5801. The novel allele has been designated B*1520 because of its sequence similarity with the B15 group; furthermore, serological analysis shows that the B*1520 product does not express epitopes in common with either B35, B53 or B58. The B*1520 heavy chain has a similar isoelectric point to A*3101; B*1520 was undetected by previous applications of isoelectric focusing because B*1520 and A31 are both expressed by OLGA. In conclusion, HLA-B typing of two cell lines by cDNA cloning and sequencing gives concordant results with SBT for three of the four alleles. The cause of the discrepancy for the fourth allele is unknown, however, this finding indicates that the novel HLA-A, B and C sequences emerging from SBT studies need independent verification.
Development of Pineapple Microsatellite Markers and Germplasm Genetic Diversity Analysis
Tong, Helin; Chen, You; Wang, Jingyi; Chen, Yeyuan; Sun, Guangming; He, Junhu; Wu, Yaoting
2013-01-01
Two methods were used to develop pineapple microsatellite markers. Genomic library-based SSR development: using selectively amplified microsatellite assay, 86 sequences were generated from pineapple genomic library. 91 (96.8%) of the 94 Simple Sequence Repeat (SSR) loci were dinucleotide repeats (39 AC/GT repeats and 52 GA/TC repeats, accounting for 42.9% and 57.1%, resp.), and the other three were mononucleotide repeats. Thirty-six pairs of SSR primers were designed; 24 of them generated clear bands of expected sizes, and 13 of them showed polymorphism. EST-based SSR development: 5659 pineapple EST sequences obtained from NCBI were analyzed; among 1397 nonredundant EST sequences, 843 were found containing 1110 SSR loci (217 of them contained more than one SSR locus). Frequency of SSRs in pineapple EST sequences is 1SSR/3.73 kb, and 44 types were found. Mononucleotide, dinucleotide, and trinucleotide repeats dominate, accounting for 95.6% in total. AG/CT and AGC/GCT were the dominant type of dinucleotide and trinucleotide repeats, accounting for 83.5% and 24.1%, respectively. Thirty pairs of primers were designed for each of randomly selected 30 sequences; 26 of them generated clear and reproducible bands, and 22 of them showed polymorphism. Eighteen pairs of primers obtained by the one or the other of the two methods above that showed polymorphism were selected to carry out germplasm genetic diversity analysis for 48 breeds of pineapple; similarity coefficients of these breeds were between 0.59 and 1.00, and they can be divided into four groups accordingly. Amplification products of five SSR markers were extracted and sequenced, corresponding repeat loci were found and locus mutations are mainly in copy number of repeats and base mutations in the flanking region. PMID:24024187
Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin
2015-12-01
We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Scholz, Christian F P; Jensen, Anders
2017-01-01
The protocol describes a computational method to develop a Single Locus Sequence Typing (SLST) scheme for typing bacterial species. The resulting scheme can be used to type bacterial isolates as well as bacterial species directly from complex communities using next-generation sequencing technologies.
Piran, Arezoo; Shahcheraghi, Fereshteh; Solgi, Hamid; Rohani, Mahdi; Badmasti, Farzad
2017-10-01
The multi-drug resistant (MDR) Acinetobacter baumannii as an important nosocomial pathogen has emerged a global health concern in recent years. In this study, we applied three easier, faster, and cost-effective methods including PCR-based open reading frames (ORFs) typing, sequence typing of bla OXA-51-like and RAPD-PCR method to rapid typing of A. baumannii strains. Taken together in the present study the results of ORFs typing, PCR-sequencing of bla OXA-51-like genes and MLST sequence typing revealed there was a high prevalence (62%, 35/57) of ST2 as international and successful clone which detected among clinical isolates of multi-drug resistant A. baumannii with ORF pattern B and bla OXA-66 gene. Only 7% (4/57) of MDR isolates belonged to ST1 with ORF pattern A and bla OXA-69 gene. Interestingly, we detected singleton ST513 (32%, 18/57) that encoded bla OXA-90 and showed the ORF pattern H as previously isolated in Middle East. Moreover, our data showed RAPD-PCR method can detect divergent strains of the STs. The Cl-1, Cl-2, Cl-3, Cl-4, Cl-10, Cl-11, Cl-12, Cl-13 and Cl-14 belonged to ST2. While the Cl-6, Cl-7, Cl-8 and Cl-9 belonged to ST513. Only Cl-5 belonged to ST1. It seems that the combination of these methods have more discriminatory than any method separately and could be effectively applied to rapid detection of the clonal complex (CC) of A. baumannii strains without performing of MLST or PFGE. Copyright © 2017 Elsevier B.V. All rights reserved.
Next generation sequencing (NGS): a golden tool in forensic toolkit.
Aly, S M; Sabri, D M
The DNA analysis is a cornerstone in contemporary forensic sciences. DNA sequencing technologies are powerful tools that enrich molecular sciences in the past based on Sanger sequencing and continue to glowing these sciences based on Next generation sequencing (NGS). Next generation sequencing has excellent potential to flourish and increase the molecular applications in forensic sciences by jumping over the pitfalls of the conventional method of sequencing. The main advantages of NGS compared to conventional method that it utilizes simultaneously a large number of genetic markers with high-resolution of genetic data. These advantages will help in solving several challenges such as mixture analysis and dealing with minute degraded samples. Based on these new technologies, many markers could be examined to get important biological data such as age, geographical origins, tissue type determination, external visible traits and monozygotic twins identification. It also could get data related to microbes, insects, plants and soil which are of great medico-legal importance. Despite the dozens of forensic research involving NGS, there are requirements before using this technology routinely in forensic cases. Thus, there is a great need to more studies that address robustness of these techniques. Therefore, this work highlights the applications of forensic sciences in the era of massively parallel sequencing.
Human papillomavirus detection and typing using a nested-PCR-RFLP assay.
Coser, Janaina; Boeira, Thaís da Rocha; Fonseca, André Salvador Kazantzi; Ikuta, Nilo; Lunge, Vagner Ricardo
2011-01-01
It is clinically important to detect and type human papillomavirus (HPV) in a sensitive and specific manner. Development of a nested-polymerase chain reaction-restriction fragment length polymorphism (nested-PCR-RFLP) assay to detect and type HPV based on the analysis of L1 gene. Analysis of published DNA sequence of mucosal HPV types to select sequences of new primers. Design of an original nested-PCR assay using the new primers pair selected and classical MY09/11 primers. HPV detection and typing in cervical samples using the nested-PCR-RFLP assay. The nested-PCR-RFLP assay detected and typed HPV in cervical samples. Of the total of 128 clinical samples submitted to simple PCR and nested-PCR for detection of HPV, 37 (28.9%) were positive for the virus by both methods and 25 samples were positive only by nested-PCR (67.5% increase in detection rate compared with single PCR). All HPV positive samples were effectively typed by RFLP assay. The method of nested-PCR proved to be an effective diagnostic tool for HPV detection and typing.
Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting
NASA Astrophysics Data System (ADS)
Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.
1997-05-01
Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samudrala, Ram; Heffron, Fred; McDermott, Jason E.
2009-04-24
The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates, effector proteins, are not. We have used a machine learning approach to identify new secreted effectors. The method integrates evolutionary measures, such as the pattern of homologs in a range of other organisms, and sequence-based features, such as G+C content, amino acid composition and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from Salmonella typhimurium and validated on a corresponding set of effectors from Pseudomonas syringae, aftermore » eliminating effectors with detectable sequence similarity. The method was able to identify all of the known effectors in P. syringae with a specificity of 84% and sensitivity of 82%. The reciprocal validation, training on P. syringae and validating on S. typhimurium, gave similar results with a specificity of 86% when the sensitivity level was 87%. These results show that type III effectors in disparate organisms share common features. We found that maximal performance is attained by including an N-terminal sequence of only 30 residues, which agrees with previous studies indicating that this region contains the secretion signal. We then used the method to define the most important residues in this putative secretion signal. Finally, we present novel predictions of secreted effectors in S. typhimurium, some of which have been experimentally validated, and apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis. This approach is a novel and effective way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.« less
Harvala, Heli; Jasir, Aftab; Penttinen, Pasi; Pastore Celentano, Lucia; Greco, Donato; Broberg, Eeva
2017-01-01
Enteroviruses (EVs) cause severe outbreaks of respiratory and neurological disease as illustrated by EV-D68 and EV-A71 outbreaks, respectively. We have mapped European laboratory capacity for identification and characterisation of non-polio EVs to improve preparedness to respond to (re)-emerging EVs linked to severe disease. An online questionnaire on non-polio EV surveillance and laboratory detection was submitted to all 30 European Union (EU)/European Economic Area (EEA) countries. Twenty-nine countries responded; 26 conducted laboratory-based non-polio EV surveillance, and 24 included neurological infections in their surveillance. Eleven countries have established specific surveillance for EV-D68 via sentinel influenza surveillance (n = 7), typing EV-positive respiratory samples (n = 10) and/or acute flaccid paralysis surveillance (n = 5). Of 26 countries performing non-polio EV characterisation/typing, 10 further characterised culture-positive EV isolates, whereas the remainder typed PCR-positive but culture-negative samples. Although 19 countries have introduced sequence-based EV typing, seven still rely entirely on virus isolation. Based on 2015 data, six countries typed over 300 specimens mostly by sequencing, whereas 11 countries characterised under 50 EV-positive samples. EV surveillance activity varied between EU/EEA countries, and did not always specifically target patients with neurological and/or respiratory infections. Introduction of sequence-based typing methods is needed throughout the EU/EEA to enhance laboratory capacity for the detection of EVs. PMID:29162204
Harvala, Heli; Jasir, Aftab; Penttinen, Pasi; Pastore Celentano, Lucia; Greco, Donato; Broberg, Eeva
2017-11-01
Enteroviruses (EVs) cause severe outbreaks of respiratory and neurological disease as illustrated by EV-D68 and EV-A71 outbreaks, respectively. We have mapped European laboratory capacity for identification and characterisation of non-polio EVs to improve preparedness to respond to (re)-emerging EVs linked to severe disease. An online questionnaire on non-polio EV surveillance and laboratory detection was submitted to all 30 European Union (EU)/European Economic Area (EEA) countries. Twenty-nine countries responded; 26 conducted laboratory-based non-polio EV surveillance, and 24 included neurological infections in their surveillance. Eleven countries have established specific surveillance for EV-D68 via sentinel influenza surveillance (n = 7), typing EV-positive respiratory samples (n = 10) and/or acute flaccid paralysis surveillance (n = 5). Of 26 countries performing non-polio EV characterisation/typing, 10 further characterised culture-positive EV isolates, whereas the remainder typed PCR-positive but culture-negative samples. Although 19 countries have introduced sequence-based EV typing, seven still rely entirely on virus isolation. Based on 2015 data, six countries typed over 300 specimens mostly by sequencing, whereas 11 countries characterised under 50 EV-positive samples. EV surveillance activity varied between EU/EEA countries, and did not always specifically target patients with neurological and/or respiratory infections. Introduction of sequence-based typing methods is needed throughout the EU/EEA to enhance laboratory capacity for the detection of EVs.
Reads2Type: a web application for rapid microbial taxonomy identification.
Saputra, Dhany; Rasmussen, Simon; Larsen, Mette V; Haddad, Nizar; Sperotto, Maria Maddalena; Aarestrup, Frank M; Lund, Ole; Sicheritz-Pontén, Thomas
2015-11-25
Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.
Yin, Yuxin; Lan, James H; Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F; Zhang, Qiuheng
2016-01-01
Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3' UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation.
A RESTful application programming interface for the PubMLST molecular typing and genome databases
Bray, James E.; Maiden, Martin C. J.
2017-01-01
Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452
Xiao, Xianjin; Wu, Tongbo; Xu, Lei; Chen, Wei
2017-01-01
Abstract Genetic mutations are important biomarkers for cancer diagnostics and surveillance. Preferably, the methods for mutation detection should be straightforward, highly specific and sensitive to low-level mutations within various sequence contexts, fast and applicable at room-temperature. Though some of the currently available methods have shown very encouraging results, their discrimination efficiency is still very low. Herein, we demonstrate a branch-migration based fluorescent probe (BM probe) which is able to identify the presence of known or unknown single-base variations at abundances down to 0.3%-1% within 5 min, even in highly GC-rich sequence regions. The discrimination factors between the perfect-match target and single-base mismatched target are determined to be 89–311 by measurement of their respective branch-migration products via polymerase elongation reactions. The BM probe not only enabled sensitive detection of two types of EGFR-associated point mutations located in GC-rich regions, but also successfully identified the BRAF V600E mutation in the serum from a thyroid cancer patient which could not be detected by the conventional sequencing method. The new method would be an ideal choice for high-throughput in vitro diagnostics and precise clinical treatment. PMID:28201758
Brancaccio, Rosario N; Robitaille, Alexis; Dutta, Sankhadeep; Cuenin, Cyrille; Santare, Daiga; Skenders, Girts; Leja, Marcis; Fischer, Nicole; Giuliano, Anna R; Rollison, Dana E; Grundhoff, Adam; Tommasino, Massimo; Gheit, Tarik
2018-05-07
With the advent of new molecular tools, the discovery of new papillomaviruses (PVs) has accelerated during the past decade, enabling the expansion of knowledge about the viral populations that inhabit the human body. Human PVs (HPVs) are etiologically linked to benign or malignant lesions of the skin and mucosa. The detection of HPV types can vary widely, depending mainly on the methodology and the quality of the biological sample. Next-generation sequencing is one of the most powerful tools, enabling the discovery of novel viruses in a wide range of biological material. Here, we report a novel protocol for the detection of known and unknown HPV types in human skin and oral gargle samples using improved PCR protocols combined with next-generation sequencing. We identified 105 putative new PV types in addition to 296 known types, thus providing important information about the viral distribution in the oral cavity and skin. Copyright © 2018. Published by Elsevier Inc.
Antonov, Valery A; Tkachenko, Galina A; Altukhova, Viktoriya V; Savchenko, Sergey S; Zinchenko, Olga V; Viktorov, Dmitry V; Zamaraev, Valery S; Ilyukhin, Vladimir I; Alekseev, Vladimir V
2008-12-01
Burkholderia mallei and B. pseudomallei are highly pathogenic microorganisms for both humans and animals. Moreover, they are regarded as potential agents of bioterrorism. Thus, rapid and unequivocal detection and identification of these dangerous pathogens is critical. In the present study, we describe the use of an optimized protocol for the early diagnosis of experimental glanders and melioidosis and for the rapid differentiation and typing of Burkholderia strains. This experience with PCR-based identification methods indicates that single PCR targets (23S and 16S rRNA genes, 16S-23S intergenic region, fliC and type III secretion gene cluster) should be used with caution for identification of B. mallei and B. pseudomallei, and need to be used alongside molecular methods such as gene sequencing. Several molecular typing procedures have been used to identify genetically related B. pseudomallei and B. mallei isolates, including ribotyping, pulsed-field gel electrophoresis and multilocus sequence typing. However, these methods are time consuming and technically challenging for many laboratories. RAPD, variable amplicon typing scheme, Rep-PCR, BOX-PCR and multiple-locus variable-number tandem repeat analysis have been recommended by us for the rapid differentiation of B. mallei and B. pseudomallei strains.
Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo
2017-08-16
Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates.
Yu, Zhongtang; Yu, Marie; Morrison, Mark
2006-04-01
Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.
HLA-A, -B, -DRB1 allele and haplotype frequencies of 920 cord blood units from Central Chile.
Schäfer, Christian; Sauter, Jürgen; Riethmüller, Tobias; Kashi, Zahra Mehdizadeh; Schmidt, Alexander H; Barriga, Francisco J
2016-08-01
We present human leukocyte antigen (HLA) haplotype and allele/antigenic group frequencies derived from a data set of 920 umbilical cord blood units collected in Central Chile. HLA-A and -B genotypes were typed using sequence specific oligonucleotide probe methods while HLA-DRB1 genotypes were obtained from sequencing-based typing. The most frequent haplotype is A*29~B*44~DRB1*07:01 with an estimated frequency of 2.1%. Copyright © 2016 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Charting improvements in US registry HLA typing ambiguity using a typing resolution score.
Paunić, Vanja; Gragert, Loren; Schneider, Joel; Müller, Carlheinz; Maiers, Martin
2016-07-01
Unrelated stem cell registries have been collecting HLA typing of volunteer bone marrow donors for over 25years. Donor selection for hematopoietic stem cell transplantation is based primarily on matching the alleles of donors and patients at five polymorphic HLA loci. As HLA typing technologies have continually advanced since the beginnings of stem cell transplantation, registries have accrued typings of varied HLA typing ambiguity. We present a new typing resolution score (TRS), based on the likelihood of self-match, that allows the systematic comparison of HLA typings across different methods, data sets and populations. We apply the TRS to chart improvement in HLA typing within the Be The Match Registry of the United States from the initiation of DNA-based HLA typing to the current state of high-resolution typing using next-generation sequencing technologies. In addition, we present a publicly available online tool for evaluation of any given HLA typing. This TRS objectively evaluates HLA typing methods and can help define standards for acceptable recruitment HLA typing. Copyright © 2016 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Dan, Tong; Liu, Wenjun; Sun, Zhihong; Lv, Qiang; Xu, Haiyan; Song, Yuqin; Zhang, Heping
2014-06-09
Economically, Leuconostoc lactis is one of the most important species in the genus Leuconostoc. It plays an important role in the food industry including the production of dextrans and bacteriocins. Currently, traditional molecular typing approaches for characterisation of this species at the isolate level are either unavailable or are not sufficiently reliable for practical use. Multilocus sequence typing (MLST) is a robust and reliable method for characterising bacterial and fungal species at the molecular level. In this study, a novel MLST protocol was developed for 50 L. lactis isolates from Mongolia and China. Sequences from eight targeted genes (groEL, carB, recA, pheS, murC, pyrG, rpoB and uvrC) were obtained. Sequence analysis indicated 20 different sequence types (STs), with 13 of them being represented by a single isolate. Phylogenetic analysis based on the sequences of eight MLST loci indicated that the isolates belonged to two major groups, A (34 isolates) and B (16 isolates). Linkage disequilibrium analyses indicated that recombination occurred at a low frequency in L. lactis, indicating a clonal population structure. Split-decomposition analysis indicated that intraspecies recombination played a role in generating genotypic diversity amongst isolates. Our results indicated that MLST is a valuable tool for typing L. lactis isolates that can be used for further monitoring of evolutionary changes and population genetics.
Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King
2012-01-01
Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562
Advances in Molecular Serotyping and Subtyping of Escherichia coli.
Fratamico, Pina M; DebRoy, Chitrita; Liu, Yanhong; Needleman, David S; Baranzoni, Gian Marco; Feng, Peter
2016-01-01
Escherichia coli plays an important role as a member of the gut microbiota; however, pathogenic strains also exist, including various diarrheagenic E. coli pathotypes and extraintestinal pathogenic E. coli that cause illness outside of the GI-tract. E. coli have traditionally been serotyped using antisera against the ca. 186 O-antigens and 53 H-flagellar antigens. Phenotypic methods, including bacteriophage typing and O- and H- serotyping for differentiating and characterizing E. coli have been used for many years; however, these methods are generally time consuming and not always accurate. Advances in next generation sequencing technologies have made it possible to develop genetic-based subtyping and molecular serotyping methods for E. coli, which are more discriminatory compared to phenotypic typing methods. Furthermore, whole genome sequencing (WGS) of E. coli is replacing established subtyping methods such as pulsed-field gel electrophoresis, providing a major advancement in the ability to investigate food-borne disease outbreaks and for trace-back to sources. A variety of sequence analysis tools and bioinformatic pipelines are being developed to analyze the vast amount of data generated by WGS and to obtain specific information such as O- and H-group determination and the presence of virulence genes and other genetic markers.
Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I
2018-04-14
Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Coulthart, Michael B; Posada, David; Crandall, Keith A; Dekaban, Gregory A
2006-03-01
Recently, the putative finding of ancient human T cell leukemia virus type 1 (HTLV-1) long terminal repeat (LTR) DNA sequences in association with a 1500-year-old Chilean mummy has stirred vigorous debate. The debate is based partly on the inherent uncertainties associated with phylogenetic reconstruction when only short sequences of closely related genotypes are available. However, a full analysis of what phylogenetic information is present in the mummy data has not previously been published, leaving open the question of what precisely is the range of admissible interpretation. To fulfill this need, we re-analyzed the mummy data in a new way. We first performed phylogenetic analysis of 188 published LTR DNA sequences from extant strains belonging to the HTLV-1 Cosmopolitan clade, using the method of statistical parsimony which is designed both to optimize phylogenetic resolution among sequences with little evolutionary divergence, and to permit precise mapping of individual sequence mutations onto branches of a divergence network. We then deduced possible phylogenetic positions for the two main categories of published Chilean mummy sequences, based on their published 157-nucleotide LTR sequences. The possible phylogenetic placements for one of the mummy sequence categories are consistent with a modern origin. However, one of these placements for the other mummy sequence category falls very close to the root of the Cosmopolitan clade, consistent with an ancient origin for both this mummy sequence and the Cosmopolitan clade.
KpnBI is the prototype of a new family (IE) of bacterial type I restriction-modification system
Chin, V.; Valinluck, V.; Magaki, S.; Ryu, J.
2004-01-01
KpnBI is a restriction-modification (R-M) system recognized in the GM236 strain of Klebsiella pneumoniae. Here, the KpnBI modification genes were cloned into a plasmid using a modification expression screening method. The modification genes that consist of both hsdM (2631 bp) and hsdS (1344 bp) genes were identified on an 8.2 kb EcoRI chromosomal fragment. These two genes overlap by one base and share the same promoter located upstream of the hsdM gene. Using recently developed plasmid R-M tests and a computer program RM Search, the DNA recognition sequence for the KpnBI enzymes was identified as a new 8 nt sequence containing one degenerate base with a 6 nt spacer, CAAANNNNNNRTCA. From Dam methylation and HindIII sensitivity tests, the methylation loci were predicted to be the italicized third adenine in the 5′ specific region and the adenine opposite the italicized thymine in the 3′ specific region. Combined with previous sequence data for hsdR, we concluded that the KpnBI system is a typical type I R-M system. The deduced amino acid sequences of the three subunits of the KpnBI system show only limited homologies (25 to 33% identity) at best, to the four previously categorized type I families (IA, IB, IC, and ID). Furthermore, their identity scores to other uncharacterized putative genome type I sequences were 53% at maximum. Therefore, we propose that KpnBI is the prototype of a new ‘type IE’ family. PMID:15475385
Xue, Jian; Wu, Riga; Pan, Yajiao; Wang, Shunxia; Qu, Baowang; Qin, Ying; Shi, Yuequn; Zhang, Chuchu; Li, Ran; Zhang, Liyan; Zhou, Cheng; Sun, Hongyu
2018-04-02
Massively parallel sequencing (MPS) technologies, also termed as next-generation sequencing (NGS), are becoming increasingly popular in study of short tandem repeats (STR). However, current library preparation methods are usually based on ligation or two-round PCR that requires more steps, making it time-consuming (about 2 days), laborious and expensive. In this study, a 16-plex STR typing system was designed with fusion primer strategy based on the Ion Torrent S5 XL platform which could effectively resolve the above challenges for forensic DNA database-type samples (bloodstains, saliva stains, etc.). The efficiency of this system was tested in 253 Han Chinese participants. The libraries were prepared without DNA isolation and adapter ligation, and the whole process only required approximately 5 h. The proportion of thoroughly genotyped samples in which all the 16 loci were successfully genotyped was 86% (220/256). Of the samples, 99.7% showed 100% concordance between NGS-based STR typing and capillary electrophoresis (CE)-based STR typing. The inconsistency might have been caused by off-ladder alleles and mutations in primer binding sites. Overall, this panel enabled the large-scale genotyping of the DNA samples with controlled quality and quantity because it is a simple, operation-friendly process flow that saves labor, time and costs. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Goyette-Desjardins, Guillaume; Auger, Jean-Philippe; Xu, Jianguo; Segura, Mariela; Gottschalk, Marcelo
2014-01-01
Streptococcus suis is an important pathogen causing economic problems in the pig industry. Moreover, it is a zoonotic agent causing severe infections to people in close contact with infected pigs or pork-derived products. Although considered sporadic in the past, human S. suis infections have been reported during the last 45 years, with two large outbreaks recorded in China. In fact, the number of reported human cases has significantly increased in recent years. In this review, we present the worldwide distribution of serotypes and sequence types (STs), as determined by multilocus sequence typing, for pigs (between 2002 and 2013) and humans (between 1968 and 2013). The methods employed for S. suis identification and typing, the current epidemiological knowledge regarding serotypes and STs and the zoonotic potential of S. suis are discussed. Increased awareness of S. suis in both human and veterinary diagnostic laboratories and further establishment of typing methods will contribute to our knowledge of this pathogen, especially in regions where complete and/or recent data is lacking. More research is required to understand differences in virulence that occur among S. suis strains and if these differences can be associated with specific serotypes or STs. PMID:26038745
Malhotra, Karan; Noor, M Omair; Krull, Ulrich J
2018-05-29
Diagnostic technology that makes use of paper platforms in conjunction with the ubiquitous availability of digital cameras in cellular telephones and personal assistive devices offers opportunities for development of bioassays that are cost effective and widely distributed. Assays that operate effectively in aqueous solution require further development for implementation in paper substrates, overcoming issues associated with surface interactions on a matrix that offers a large surface-to-volume ratio and constraints on convective mixing. This report presents and compares two related methods for determination of oligonucleotides that serve as indicators of cystic fibrosis, differentiating between the normal wild-type sequence, and a mutant-type sequence that has a 3-base replacement. The transduction strategy operates by selective hybridization of oligonucleotide probes that are conjugated to fluorescent quantum dots, where hybridization of target sequences causes a molecular fluorophore to approach the quantum dot and become emissive through fluorescence resonance energy transfer. Detection can rely on hybridization of a target that is labelled with Cy3 fluorophore, or in the presence of an unlabelled target when a sandwich assay format is implemented with a labelled reporter oligonucleotide. Selectivity to determine the presence of mismatched sequences involves appropriate selection of nucleotide sequences to set melt temperatures, in conjunction with control of stringency conditions using formamide as a chaotrope. It was determined that both direct and sandwich assays on paper substrates are able to distinguish between wild-type and mutant-type samples.
Dactyl Alphabet Gesture Recognition in a Video Sequence Using Microsoft Kinect
NASA Astrophysics Data System (ADS)
Artyukhin, S. G.; Mestetskiy, L. M.
2015-05-01
This paper presents an efficient framework for solving the problem of static gesture recognition based on data obtained from the web cameras and depth sensor Kinect (RGB-D - data). Each gesture given by a pair of images: color image and depth map. The database store gestures by it features description, genereated by frame for each gesture of the alphabet. Recognition algorithm takes as input a video sequence (a sequence of frames) for marking, put in correspondence with each frame sequence gesture from the database, or decide that there is no suitable gesture in the database. First, classification of the frame of the video sequence is done separately without interframe information. Then, a sequence of successful marked frames in equal gesture is grouped into a single static gesture. We propose a method combined segmentation of frame by depth map and RGB-image. The primary segmentation is based on the depth map. It gives information about the position and allows to get hands rough border. Then, based on the color image border is specified and performed analysis of the shape of the hand. Method of continuous skeleton is used to generate features. We propose a method of skeleton terminal branches, which gives the opportunity to determine the position of the fingers and wrist. Classification features for gesture is description of the position of the fingers relative to the wrist. The experiments were carried out with the developed algorithm on the example of the American Sign Language. American Sign Language gesture has several components, including the shape of the hand, its orientation in space and the type of movement. The accuracy of the proposed method is evaluated on the base of collected gestures consisting of 2700 frames.
Wang, Yi-Chun; Wang, Jing-Doo; Chen, Chin-Han; Chen, Yi-Wen; Li, Chuan
2015-03-01
We developed a novel BLAST-Based Relative Distance (BBRD) method by Pearson's correlation coefficient to avoid the problems of tedious multiple sequence alignment and complicated outgroup selection. We showed its application on reconstructing reliable phylogeny for nucleotide and protein sequences as exemplified by the fmr-1 gene and dihydrolipoamide dehydrogenase, respectively. We then used BBRD to resolve 124 protein arginine methyltransferases (PRMTs) that are homologues of nine mammalian PRMTs. The tree placed the uncharacterized PRMT9 with PRMT7 in the same clade, outside of all the Type I PRMTs including PRMT1 and its vertebrate paralogue PRMT8, PRMT3, PRMT6, PRMT2 and PRMT4. The PRMT7/9 branch then connects with the type II PRMT5. Some non-vertebrates contain different PRMTs without high sequence homology with the mammalian PRMTs. For example, in the case of Drosophila arginine methyltransferase (DART) and Trypanosoma brucei methyltransferases (TbPRMTs) in the analyses, the BBRD program grouped them with specific clades and thus suggested their evolutionary relationships. The BBRD method thus provided a great tool to construct a reliable tree for members of protein families through evolution. Copyright © 2015 Elsevier Inc. All rights reserved.
Woo, Hye In; Joo, Eun Yeon; Lee, Kyung Wha
2012-01-01
Background Narcolepsy is a neurologic disorder characterized by excessive daytime sleepiness, symptoms of abnormal rapid eye movement (REM) sleep, and a strong association with HLA-DRB1*1501, -DQA1*0102, and -DQB1*0602. Here, we investigated the clinico-physical characteristics of Korean patients with narcolepsy, their HLA types, and the clinical utility of high-resolution PCR with sequence-specific primers (PCR-SSP) as a simple typing method for identifying DRB1*15/16, DQA1, and DQB1 alleles. Methods The study population consisted of 67 consecutively enrolled patients having unexplained daytime sleepiness and diagnosed narcolepsy based on clinical and neurological findings. Clinical data and the results of the multiple sleep latency test and polysomnography were reviewed, and HLA typing was performed using both high-resolution PCR-SSP and sequence-based typing (SBT). Results The 44 narcolepsy patients with cataplexy displayed significantly higher frequencies of DRB1*1501 (Pc= 0.003), DQA1*0102 (Pc=0.001), and DQB1*0602 (Pc=0.014) than the patients without cataplexy. Among patients carrying DRB1*1501-DQB1*0602 or DQA1*0102, the frequencies of a mean REM sleep latency of less than 20 min in nocturnal polysomnography and clinical findings, including sleep paralysis and hypnagogic hallucination were significantly higher. SBT and PCR-SSP showed 100% concordance for high-resolution typing of DRB1*15/16 alleles and DQA1 and DQB1 loci. Conclusions The clinical characteristics and somnographic findings of narcolepsy patients were associated with specific HLA alleles, including DRB1*1501, DQA1*0102, and DQB1*0602. Application of high-resolution PCR-SSP, a reliable and simple method, for both allele- and locus-specific HLA typing of DRB1*15/16, DQA1, and DQB1 would be useful for characterizing clinical status among subjects with narcolepsy. PMID:22259780
Tipu, Hamid Nawaz; Bashir, Muhammad Mukarram; Noman, Muhammad
2016-10-01
Serology and DNA techniques are employed for Human Leukocyte Antigen (HLA) typing in different transplant centers. Results may not always correlate well and may need retyping with different technique. All the patients (with aplastic anemia, thalassemia, and immunodeficiency) and their donors, requiring HLA typing for bone marrow transplant were enrolled in the study. Serological HLA typing was done by complement-dependent lymphocytotoxicity while DNA-based typing was done with sequence specific primers (SSP). Serology identified 167 HLA A and 165 HLA B antigens while SSP in same samples identified 181 HLA A and 184 HLA B alleles. A11 and B51 were the commonest antigens/alleles by both methods. There were a total of 21 misreads and 32 dropouts on serology, for both HLA A and B loci with HLA A32, B52 and B61 being the most ambiguous antigens. Inherent limitations of serological techniques warrant careful interpretation or use of DNA-based methods for resolution of ambiguous typing.
Amicarelli, Giulia; Adlerstein, Daniel; Shehi, Erlet; Wang, Fengfei; Makrigiorgos, G Mike
2006-10-01
Genotyping methods that reveal single-nucleotide differences are useful for a wide range of applications. We used digestion of 3-way DNA junctions in a novel technology, OneCutEventAmplificatioN (OCEAN) that allows sequence-specific signal generation and amplification. We combined OCEAN with peptide-nucleic-acid (PNA)-based variant enrichment to detect and simultaneously genotype v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) codon 12 sequence variants in human tissue specimens. We analyzed KRAS codon 12 sequence variants in 106 lung cancer surgical specimens. We conducted a PNA-PCR reaction that suppresses wild-type KRAS amplification and genotyped the product with a set of OCEAN reactions carried out in fluorescence microplate format. The isothermal OCEAN assay enabled a 3-way DNA junction to form between the specific target nucleic acid, a fluorescently labeled "amplifier", and an "anchor". The amplifier-anchor contact contains the recognition site for a restriction enzyme. Digestion produces a cleaved amplifier and generation of a fluorescent signal. The cleaved amplifier dissociates from the 3-way DNA junction, allowing a new amplifier to bind and propagate the reaction. The system detected and genotyped KRAS sequence variants down to approximately 0.3% variant-to-wild-type alleles. PNA-PCR/OCEAN had a concordance rate with PNA-PCR/sequencing of 93% to 98%, depending on the exact implementation. Concordance rate with restriction endonuclease-mediated selective-PCR/sequencing was 89%. OCEAN is a practical and low-cost novel technology for sequence-specific signal generation. Reliable analysis of KRAS sequence alterations in human specimens circumvents the requirement for sequencing. Application is expected in genotyping KRAS codon 12 sequence variants in surgical specimens or in bodily fluids, as well as single-base variations and sequence alterations in other genes.
Yang, A S; Hitz, B; Honig, B
1996-06-21
The stability of beta-turns is calculated as a function of sequence and turn type with a Monte Carlo sampling technique. The conformational energy of four internal hydrogen-bonded turn types, I, I', II and II', is obtained by evaluating their gas phase energy with the CHARMM force field and accounting for solvation effects with the Finite Difference Poisson-Boltzmann (FDPB) method. All four turn types are found to be less stable than the coil state, independent of the sequence in the turn. The free-energy penalties associated with turn formation vary between 1.6 kcal/mol and 7.7 kcal/mol, depending on the sequence and turn type. Differences in turn stability arise mainly from intraresidue interactions within the two central residues of the turn. For each combination of the two central residues, except for -Gly-Gly-, the most stable beta-turn type is always found to occur most commonly in native proteins. The fact that a model based on local interactions accounts for the observed preference of specific sequences suggests that long-range tertiary interactions tend to play a secondary role in determining turn conformation. In contrast, for beta-hairpins, long-range interactions appear to dominate. Specifically, due to the right-handed twist of beta-strands, type I' turns for -Gly-Gly- are found to occur with high frequency, even when local energetics would dictate otherwise. The fact that any combination of two residues is found able to adopt a relatively low-energy turn structure explains why the amino acid sequence in turns is highly variable. The calculated free-energy cost of turn formation, when combined with related numbers obtained for alpha-helices and beta-sheets, suggests a model for the initiation of protein folding based on metastable fragments of secondary structure.
Repair of DNA damage caused by cytosine deamination in mitochondrial DNA of forensic case samples.
Gorden, Erin M; Sturk-Andreaggi, Kimberly; Marshall, Charla
2018-05-01
DNA sequence damage from cytosine deamination is well documented in degraded samples, such as those from ancient and forensic contexts. This study examined the effect of a DNA repair treatment on mitochondrial DNA (mtDNA) from aged and degraded skeletal samples. DNA extracts from 21 non-probative, degraded skeletal samples (aged 50-70 years) were utilized for the analysis. A portion of each sample extract was subjected to DNA repair using a commercial repair kit, the New England BioLabs' NEBNext FFPE DNA Repair Kit (Ipswich, MA). MtDNA was enriched using PCR and targeted capture in a side-by-side experiment of untreated and repaired DNA. Sequencing was performed using both traditional (Sanger-type; STS) and next-generation sequencing (NGS) methods Although cytosine deamination was evident in the mtDNA sequence data, the observed level of damaged bases varied by sequencing method as well as by enrichment type. The STS PCR amplicon data did not show evidence of cytosine deamination that could be distinguished from background signal in either the untreated or repaired sample set. However, the same PCR amplicons showed 850 C → T/G → A substitutions consistent with cytosine deamination with variant frequencies (VFs) of up to 25% when sequenced using NGS methods The occurrence of base misincorporation due to cytosine deamination was reduced by 98% (to 10) in the NGS amplicon data after repair. The NGS capture data indicated low levels (1-2%) of cytosine deamination in mtDNA fragments that was effectively mitigated by DNA repair. The observed difference in the level of cytosine deamination between the PCR and capture enrichment methods can be attributed to the greater propensity for stochastic effects from the PCR enrichment technique employed (e.g., low template input, increased PCR cycles). Altogether these results indicate that DNA repair may be required when sequencing PCR-amplified DNA from degraded forensic case samples with NGS methods. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Sasaki, Yohei; Fushimi, Hirotoshi; Cao, Hui; Cai, Shao-Qing; Komatsu, Katsuko
2002-12-01
The botanical origins of Chinese and Japanese Curcuma drugs were determined to be Curcuma longa, C. phaeocaulis, the Japanese population of C. zedoaria, C. kwangsiensis, C. wenyujin, and C. aromatica based on a comparison of their 18S rRNA gene and trnK gene sequences with those of six Curcuma species reported previously. Moreover, to develop a more convenient identification method, amplification-refractory mutation system (ARMS) analysis of both gene regions was performed on plants. The ARMS method for the 18S rRNA gene was established using two types of forward primers designed based on the nucleotide difference at position 234. When DNAs of four Curcuma species were used as templates, PCR amplification with either of the two primers only generated a fragment of 912 base pairs (bp). However, when DNAs of the purple-cloud type of C. kwangsiensis and C. wenyujin were used, PCR amplifications with both primers unexpectedly generated the fragment, suggesting that these two were heterozygotes. The ARMS method for the trnK gene was also established using a mixture of four types of specific reverse primers designed on the basis of base substitutions and indels among six species, and common reverse and forward primers. C. phaeocaulis or the Chinese population of C. zedoaria, the Japanese population of C. zedoaria or the purple-cloud type of C. kwangsiensis, the pubescent type of C. kwangsiensis or C. wenyujin, and C. aromatica were found to show specific fragments of 730, 185, 527 or 528, and 641 or 642 bp, respectively. All species including C. longa also showed a common fragment of 897-904 bp. Using both ARMS methods, together with information on producing areas, the identification of Curcuma plants was achieved. Moreover, the ARMS method for the trnK gene was also useful for authentication of Curcuma drugs.
NASA Technical Reports Server (NTRS)
Kretsinger, R. H.; Nakayama, S.
1993-01-01
In the previous three reports in this series we demonstrated that the EF-hand family of proteins evolved by a complex pattern of gene duplication, transposition, and splicing. The dendrograms based on exon sequences are nearly identical to those based on protein sequences for troponin C, the essential light chain myosin, the regulatory light chain, and calpain. This validates both the computational methods and the dendrograms for these subfamilies. The proposal of congruence for calmodulin, troponin C, essential light chain, and regulatory light chain was confirmed. There are, however, significant differences in the calmodulin dendrograms computed from DNA and from protein sequences. In this study we find that introns are distributed throughout the EF-hand domain and the interdomain regions. Further, dendrograms based on intron type and distribution bear little resemblance to those based on protein or on DNA sequences. We conclude that introns are inserted, and probably deleted, with relatively high frequency. Further, in the EF-hand family exons do not correspond to structural domains and exon shuffling played little if any role in the evolution of this widely distributed homolog family. Calmodulin has had a turbulent evolution. Its dendrograms based on protein sequence, exon sequence, 3'-tail sequence, intron sequences, and intron positions all show significant differences.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Composition for nucleic acid sequencing
Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY
2008-08-26
The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Re-evaluating microglia expression profiles using RiboTag and cell isolation strategies.
Haimon, Zhana; Volaski, Alon; Orthgiess, Johannes; Boura-Halfon, Sigalit; Varol, Diana; Shemer, Anat; Yona, Simon; Zuckerman, Binyamin; David, Eyal; Chappell-Maor, Louise; Bechmann, Ingo; Gericke, Martin; Ulitsky, Igor; Jung, Steffen
2018-06-01
Transcriptome profiling is widely used to infer functional states of specific cell types, as well as their responses to stimuli, to define contributions to physiology and pathophysiology. Focusing on microglia, the brain's macrophages, we report here a side-by-side comparison of classical cell-sorting-based transcriptome sequencing and the 'RiboTag' method, which avoids cell retrieval from tissue context and yields translatome sequencing information. Conventional whole-cell microglial transcriptomes were found to be significantly tainted by artifacts introduced by tissue dissociation, cargo contamination and transcripts sequestered from ribosomes. Conversely, our data highlight the added value of RiboTag profiling for assessing the lineage accuracy of Cre recombinase expression in transgenic mice. Collectively, this study indicates method-based biases, reveals observer effects and establishes RiboTag-based translatome profiling as a valuable complement to standard sorting-based profiling strategies.
Gogoi, Purnima; Borah, Probodh; Hussain, Iftikar; Das, Leena; Hazarika, Girin; Tamuly, Shantanu; Barkalita, Luit Moni
2018-05-01
A total of 12 Salmonella isolates belonging to different serovars, viz , Salmonella enterica serovar Enteritidis ( n = 4), Salmonella enterica serovar Weltevreden ( n = 4), Salmonella enterica serovar Newport ( n = 1), Salmonella enterica serovar Litchifield ( n = 1), and untypeable strains ( n = 2) were isolated from 332 diarrheic fecal samples collected from animals, birds, and humans. Of the two molecular typing methods applied, viz , repetitive element sequence-based PCR (REP-PCR) and pulsed-field gel electrophoresis (PFGE), PFGE could clearly differentiate the strains belonging to different serovars as well as differentiate between strains of the same serovar with respect to their source of isolation, whereas REP-PCR could not differentiate between strains of the same serovar. Thus, it can be suggested that PFGE is more useful and appropriate for molecular typing of Salmonella isolates during epidemiological investigations than REP-PCR. Copyright © 2018 American Society for Microbiology.
High resolution identity testing of inactivated poliovirus vaccines.
Mee, Edward T; Minor, Philip D; Martin, Javier
2015-07-09
Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Identifying functional cancer-specific miRNA-mRNA interactions in testicular germ cell tumor.
Sedaghat, Nafiseh; Fathy, Mahmood; Modarressi, Mohammad Hossein; Shojaie, Ali
2016-09-07
Testicular cancer is the most common cancer in men aged between 15 and 35 and more than 90% of testicular neoplasms are originated at germ cells. Recent research has shown the impact of microRNAs (miRNAs) in different types of cancer, including testicular germ cell tumor (TGCT). MicroRNAs are small non-coding RNAs which affect the development and progression of cancer cells by binding to mRNAs and regulating their expressions. The identification of functional miRNA-mRNA interactions in cancers, i.e. those that alter the expression of genes in cancer cells, can help delineate post-regulatory mechanisms and may lead to new treatments to control the progression of cancer. A number of sequence-based methods have been developed to predict miRNA-mRNA interactions based on the complementarity of sequences. While necessary, sequence complementarity is, however, not sufficient for presence of functional interactions. Alternative methods have thus been developed to refine the sequence-based interactions using concurrent expression profiles of miRNAs and mRNAs. This study aims to find functional cancer-specific miRNA-mRNA interactions in TGCT. To this end, the sequence-based predicted interactions are first refined using an ensemble learning method, based on two well-known methods of learning miRNA-mRNA interactions, namely, TaLasso and GenMiR++. Additional functional analyses were then used to identify a subset of interactions to be most likely functional and specific to TGCT. The final list of 13 miRNA-mRNA interactions can be potential targets for identifying TGCT-specific interactions and future laboratory experiments to develop new therapies. Copyright © 2016 Elsevier Ltd. All rights reserved.
The Use of Weighted Graphs for Large-Scale Genome Analysis
Zhou, Fang; Toivonen, Hannu; King, Ross D.
2014-01-01
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061
Microsatellite-Based Fingerprinting of Western Blackberries from Plants, IQF Berries and Puree
USDA-ARS?s Scientific Manuscript database
The blackberry industry needs a reliable method to ensure trueness-to-type of blackberry products. Microsatellite markers or simple sequence repeats (SSRs) are ideal for cultivar fingerprinting, paternity testing and identity certification. Fingerprinting is valuable for variety identification, qual...
Fritsch, Leonie; Fischer, Rainer; Wambach, Christoph; Dudek, Max; Schillberg, Stefan; Schröper, Florian
2015-08-01
Simple and reliable, high-throughput techniques to detect the zygosity of transgenic events in plants are valuable for biotechnology and plant breeding companies seeking robust genotyping data for the assessment of new lines and the monitoring of breeding programs. We show that next-generation sequencing (NGS) applied to short PCR products spanning the transgene integration site provides accurate zygosity data that are more robust and reliable than those generated by PCR-based methods. The NGS reads covered the 5' border of the transgenic events (incorporating part of the transgene and the flanking genomic DNA), or the genomic sequences flanking the unfilled transgene integration site at the wild-type locus. We compared the NGS method to competitive real-time PCR with transgene-specific and wild-type-specific primer/probe pairs, one pair matching the 5' genomic flanking sequence and 5' part of the transgene and the other matching the unfilled transgene integration site. Although both NGS and real-time PCR provided useful zygosity data, the NGS technique was favorable because it needed fewer optimization steps. It also provided statistically more-reliable evidence for the presence of each allele because each product was often covered by more than 100 reads. The NGS method is also more suitable for the genotyping of large panels of plants because up to 80 million reads can be produced in one sequencing run. Our novel method is therefore ideal for the rapid and accurate genotyping of large numbers of samples.
USDA-ARS?s Scientific Manuscript database
Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylat...
Watanabe, Yoshiyuki; Yamamoto, Hiroyuki; Oikawa, Ritsuko; Toyota, Minoru; Yamamoto, Masakazu; Kokudo, Norihiro; Tanaka, Shinji; Arii, Shigeki; Yotsuyanagi, Hiroshi; Koike, Kazuhiko; Itoh, Fumio
2015-01-01
Integration of DNA viruses into the human genome plays an important role in various types of tumors, including hepatitis B virus (HBV)–related hepatocellular carcinoma. However, the molecular details and clinical impact of HBV integration on either human or HBV epigenomes are unknown. Here, we show that methylation of the integrated HBV DNA is related to the methylation status of the flanking human genome. We developed a next-generation sequencing-based method for structural methylation analysis of integrated viral genomes (denoted G-NaVI). This method is a novel approach that enables enrichment of viral fragments for sequencing using unique baits based on the sequence of the HBV genome. We detected integrated HBV sequences in the genome of the PLC/PRF/5 cell line and found variable levels of methylation within the integrated HBV genomes. Allele-specific methylation analysis revealed that the HBV genome often became significantly methylated when integrated into highly methylated host sites. After integration into unmethylated human genome regions such as promoters, however, the HBV DNA remains unmethylated and may eventually play an important role in tumorigenesis. The observed dynamic changes in DNA methylation of the host and viral genomes may functionally affect the biological behavior of HBV. These findings may impact public health given that millions of people worldwide are carriers of HBV. We also believe our assay will be a powerful tool to increase our understanding of the various types of DNA virus-associated tumorigenesis. PMID:25653310
Pryce, Todd M; Palladino, Silvano; Price, Diane M; Gardam, Dianne J; Campbell, Peter B; Christiansen, Keryn J; Murray, Ronan J
2006-04-01
We report a direct polymerase chain reaction/sequence (d-PCRS)-based method for the rapid identification of clinically significant fungi from 5 different types of commercial broth enrichment media inoculated with clinical specimens. Media including BacT/ALERT FA (BioMérieux, Marcy l'Etoile, France) (n = 87), BACTEC Plus Aerobic/F (Becton Dickinson, Microbiology Systems, Sparks, MD) (n = 16), BACTEC Peds Plus/F (Becton Dickinson) (n = 15), BACTEC Lytic/10 Anaerobic/F (Becton Dickinson) (n = 11) bottles, and BBL MGIT (Becton Dickinson) (n = 11) were inoculated with specimens from 138 patients. A universal DNA extraction method was used combining a novel pretreatment step to remove PCR inhibitors with a column-based DNA extraction kit. Target sequences in the noncoding internal transcribed spacer regions of the rRNA gene were amplified by PCR and sequenced using a rapid (24 h) automated capillary electrophoresis system. Using sequence alignment software, fungi were identified by sequence similarity with sequences derived from isolates identified by upper-level reference laboratories or isolates defined as ex-type strains. We identified Candida albicans (n = 14), Candida parapsilosis (n = 8), Candida glabrata (n = 7), Candida krusei (n = 2), Scedosporium prolificans (n = 4), and 1 each of Candida orthopsilosis, Candida dubliniensis, Candida kefyr, Candida tropicalis, Candida guilliermondii, Saccharomyces cerevisiae, Cryptococcus neoformans, Aspergillus fumigatus, Histoplasma capsulatum, and Malassezia pachydermatis by d-PCRS analysis. All d-PCRS identifications from positive broths were in agreement with the final species identification of the isolates grown from subculture. Earlier identification of fungi using d-PCRS may facilitate prompt and more appropriate antifungal therapy.
Recent Advances in Conotoxin Classification by Using Machine Learning Methods.
Dao, Fu-Ying; Yang, Hui; Su, Zhen-Dong; Yang, Wuritu; Wu, Yun; Hui, Ding; Chen, Wei; Tang, Hua; Lin, Hao
2017-06-25
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.
Kolekar, Pandurang; Hake, Nilesh; Kale, Mohan; Kulkarni-Kale, Urmila
2014-03-01
West Nile virus (WNV), genus Flavivirus, family Flaviviridae, is a major cause of viral encephalitis with broad host range and global spread. The virus has undergone a series of evolutionary changes with emergence of various genotypic lineages that are known to differ in type and severity of the diseases caused. Currently, genotyping is carried out using molecular phylogeny of complete coding sequences and genotype is assigned based on proximity to reference genotypes in tree topology. Efficient epidemiological surveillance of WNVs demands development of objective criteria for typing. An alignment-free approach based on return time distribution (RTD) of k-mers has been validated for genotyping of WNVs. The RTDs of complete genome sequences at k=7 were found to be optimum for classification of the known lineages of WNVs as well as for genotyping. It provides time and computationally efficient alternative for genome based annotation of WNV lineages. The development of a WNV Typer server based on RTD is described (http://bioinfo.net.in/wnv/homepage.html). Both the method and the server have 100% sensitivity and specificity. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Kadlec, Kristina; Schwarz, Stefan; Goering, Richard V; Weese, J Scott
2015-12-01
Methicillin-resistant Staphylococcus pseudintermedius (MRSP) has emerged in a remarkable manner as an important problem in dogs and cats. However, limited molecular epidemiological information is available. The aims of this study were to apply direct repeat unit (dru) typing in a large collection of well-characterized MRSP isolates and to use dru typing to analyze a collection of previously uncharacterized MRSP isolates. Two collections of MRSP isolates from dogs and cats were included in this study. The first collection comprised 115 well-characterized MRSP isolates from North America and Europe. The data for these isolates included multilocus sequence typing (MLST) and staphylococcal protein A gene (spa) typing results as well as SmaI macrorestriction patterns after pulsed-field gel electrophoresis (PFGE). The second collection was a convenience sample of 360 isolates from North America. The dru region was amplified by PCR, sequenced, and analyzed. For the first collection, the discriminatory indices of the typing methods were calculated. All isolates were successfully dru typed. The discriminatory power for dru typing (D = 0.423) was comparable to that of spa typing (D = 0.445) and of MLST (D = 0.417) in the first collection. Occasionally, dru typing was able to further discriminate between isolates that shared the same spa type. Among all 475 isolates, 26 different dru types were identified, with 2 predominant types (dt9a and dt11a) among 349 (73.4%) isolates. The results of this study underline that dru typing is a useful tool for MRSP typing, being an objective, standardized, sequence-based method that is relatively cost-efficient and easy to perform. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Typing Clostridium difficile strains based on tandem repeat sequences
2009-01-01
Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124
2014-01-01
Background Economically, Leuconostoc lactis is one of the most important species in the genus Leuconostoc. It plays an important role in the food industry including the production of dextrans and bacteriocins. Currently, traditional molecular typing approaches for characterisation of this species at the isolate level are either unavailable or are not sufficiently reliable for practical use. Multilocus sequence typing (MLST) is a robust and reliable method for characterising bacterial and fungal species at the molecular level. In this study, a novel MLST protocol was developed for 50 L. lactis isolates from Mongolia and China. Results Sequences from eight targeted genes (groEL, carB, recA, pheS, murC, pyrG, rpoB and uvrC) were obtained. Sequence analysis indicated 20 different sequence types (STs), with 13 of them being represented by a single isolate. Phylogenetic analysis based on the sequences of eight MLST loci indicated that the isolates belonged to two major groups, A (34 isolates) and B (16 isolates). Linkage disequilibrium analyses indicated that recombination occurred at a low frequency in L. lactis, indicating a clonal population structure. Split-decomposition analysis indicated that intraspecies recombination played a role in generating genotypic diversity amongst isolates. Conclusions Our results indicated that MLST is a valuable tool for typing L. lactis isolates that can be used for further monitoring of evolutionary changes and population genetics. PMID:24912963
Development of a genotyping microarray for Usher syndrome
Cremers, Frans P M; Kimberling, William J; Külm, Maigi; de Brouwer, Arjan P; van Wijk, Erwin; te Brinke, Heleen; Cremers, Cor W R J; Hoefsloot, Lies H; Banfi, Sandro; Simonelli, Francesca; Fleischhauer, Johannes C; Berger, Wolfgang; Kelley, Phil M; Haralambous, Elene; Bitner‐Glindzicz, Maria; Webster, Andrew R; Saihan, Zubin; De Baere, Elfride; Leroy, Bart P; Silvestri, Giuliana; McKay, Gareth J; Koenekoop, Robert K; Millan, Jose M; Rosenberg, Thomas; Joensuu, Tarja; Sankila, Eeva‐Marja; Weil, Dominique; Weston, Mike D; Wissinger, Bernd; Kremer, Hannie
2007-01-01
Background Usher syndrome, a combination of retinitis pigmentosa (RP) and sensorineural hearing loss with or without vestibular dysfunction, displays a high degree of clinical and genetic heterogeneity. Three clinical subtypes can be distinguished, based on the age of onset and severity of the hearing impairment, and the presence or absence of vestibular abnormalities. Thus far, eight genes have been implicated in the syndrome, together comprising 347 protein‐coding exons. Methods: To improve DNA diagnostics for patients with Usher syndrome, we developed a genotyping microarray based on the arrayed primer extension (APEX) method. Allele‐specific oligonucleotides corresponding to all 298 Usher syndrome‐associated sequence variants known to date, 76 of which are novel, were arrayed. Results Approximately half of these variants were validated using original patient DNAs, which yielded an accuracy of >98%. The efficiency of the Usher genotyping microarray was tested using DNAs from 370 unrelated European and American patients with Usher syndrome. Sequence variants were identified in 64/140 (46%) patients with Usher syndrome type I, 45/189 (24%) patients with Usher syndrome type II, 6/21 (29%) patients with Usher syndrome type III and 6/20 (30%) patients with atypical Usher syndrome. The chip also identified two novel sequence variants, c.400C>T (p.R134X) in PCDH15 and c.1606T>C (p.C536S) in USH2A. Conclusion The Usher genotyping microarray is a versatile and affordable screening tool for Usher syndrome. Its efficiency will improve with the addition of novel sequence variants with minimal extra costs, making it a very useful first‐pass screening tool. PMID:16963483
Sørensen, Maria Rathmann; Ilsøe, Mette; Strube, Mikael Lenz; Bishop, Richard; Erbs, Gitte; Hartmann, Sofie Bruun; Jungersen, Gregers
2017-01-01
The need for typing of the swine leukocyte antigen (SLA) is increasing with the expanded use of pigs as models for human diseases and organ-transplantation experiments, their use in infection studies, and for design of veterinary vaccines. Knowledge of SLA sequences is furthermore a prerequisite for the prediction of epitope binding in pigs. The low number of known SLA class I alleles and the limited knowledge of their prevalence in different pig breeds emphasizes the need for efficient SLA typing methods. This study utilizes an SLA class I-typing method based on next-generation sequencing of barcoded PCR amplicons. The amplicons were generated with universal primers and predicted to resolve 68-88% of all known SLA class I alleles dependent on amplicon size. We analyzed the SLA profiles of 72 pigs from four different pig populations; Göttingen minipigs and Belgian, Kenyan, and Danish fattening pigs. We identified 67 alleles, nine previously described haplotypes and 15 novel haplotypes. The highest variation in SLA class I profiles was observed in the Danish pigs and the lowest among the Göttingen minipig population, which also have the highest percentage of homozygote individuals. Highlighting the fact that there are still numerous unknown SLA class I alleles to be discovered, a total of 12 novel SLA class I alleles were identified. Overall, we present new information about known and novel alleles and haplotypes and their prevalence in the tested pig populations.
Schoch, Conrad L; Robbertse, Barbara; Robert, Vincent; Vu, Duong; Cardinali, Gianluigi; Irinyi, Laszlo; Meyer, Wieland; Nilsson, R Henrik; Hughes, Karen; Miller, Andrew N; Kirk, Paul M; Abarenkov, Kessy; Aime, M Catherine; Ariyawansa, Hiran A; Bidartondo, Martin; Boekhout, Teun; Buyck, Bart; Cai, Qing; Chen, Jie; Crespo, Ana; Crous, Pedro W; Damm, Ulrike; De Beer, Z Wilhelm; Dentinger, Bryn T M; Divakar, Pradeep K; Dueñas, Margarita; Feau, Nicolas; Fliegerova, Katerina; García, Miguel A; Ge, Zai-Wei; Griffith, Gareth W; Groenewald, Johannes Z; Groenewald, Marizeth; Grube, Martin; Gryzenhout, Marieka; Gueidan, Cécile; Guo, Liangdong; Hambleton, Sarah; Hamelin, Richard; Hansen, Karen; Hofstetter, Valérie; Hong, Seung-Beom; Houbraken, Jos; Hyde, Kevin D; Inderbitzin, Patrik; Johnston, Peter R; Karunarathna, Samantha C; Kõljalg, Urmas; Kovács, Gábor M; Kraichak, Ekaphan; Krizsan, Krisztina; Kurtzman, Cletus P; Larsson, Karl-Henrik; Leavitt, Steven; Letcher, Peter M; Liimatainen, Kare; Liu, Jian-Kui; Lodge, D Jean; Luangsa-ard, Janet Jennifer; Lumbsch, H Thorsten; Maharachchikumbura, Sajeewa S N; Manamgoda, Dimuthu; Martín, María P; Minnis, Andrew M; Moncalvo, Jean-Marc; Mulè, Giuseppina; Nakasone, Karen K; Niskanen, Tuula; Olariaga, Ibai; Papp, Tamás; Petkovits, Tamás; Pino-Bodas, Raquel; Powell, Martha J; Raja, Huzefa A; Redecker, Dirk; Sarmiento-Ramirez, J M; Seifert, Keith A; Shrestha, Bhushan; Stenroos, Soili; Stielow, Benjamin; Suh, Sung-Oui; Tanaka, Kazuaki; Tedersoo, Leho; Telleria, M Teresa; Udayanga, Dhanushka; Untereiner, Wendy A; Diéguez Uribeondo, Javier; Subbarao, Krishna V; Vágvölgyi, Csaba; Visagie, Cobus; Voigt, Kerstin; Walker, Donald M; Weir, Bevan S; Weiß, Michael; Wijayawardene, Nalin N; Wingfield, Michael J; Xu, J P; Yang, Zhu L; Zhang, Ning; Zhuang, Wen-Ying; Federhen, Scott
2014-01-01
DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353. Published by Oxford University Press 2013. This work is written by US Government employees and is in the public domain in the US.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.
Wise, Michael J
2016-01-01
Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees
2016-01-01
Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa. PMID:27898695
High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).
Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling
2018-01-15
Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.
Metzgar, David; Myers, Christopher A.; Russell, Kevin L.; Faix, Dennis; Blair, Patrick J.; Brown, Jason; Vo, Scott; Swayne, David E.; Thomas, Colleen; Stenger, David A.; Lin, Baochuan; Malanoski, Anthony P.; Wang, Zheng; Blaney, Kate M.; Long, Nina C.; Schnur, Joel M.; Saad, Magdi D.; Borsuk, Lisa A.; Lichanska, Agnieszka M.; Lorence, Matthew C.; Weslowski, Brian; Schafer, Klaus O.; Tibbetts, Clark
2010-01-01
For more than four decades the cause of most type A influenza virus infections of humans has been attributed to only two viral subtypes, A/H1N1 or A/H3N2. In contrast, avian and other vertebrate species are a reservoir of type A influenza virus genome diversity, hosting strains representing at least 120 of 144 combinations of 16 viral hemagglutinin and 9 viral neuraminidase subtypes. Viral genome segment reassortments and mutations emerging within this reservoir may spawn new influenza virus strains as imminent epidemic or pandemic threats to human health and poultry production. Traditional methods to detect and differentiate influenza virus subtypes are either time-consuming and labor-intensive (culture-based) or remarkably insensitive (antibody-based). Molecular diagnostic assays based upon reverse transcriptase-polymerase chain reaction (RT-PCR) have short assay cycle time, and high analytical sensitivity and specificity. However, none of these diagnostic tests determine viral gene nucleotide sequences to distinguish strains and variants of a detected pathogen from one specimen to the next. Decision-quality, strain- and variant-specific pathogen gene sequence information may be critical for public health, infection control, surveillance, epidemiology, or medical/veterinary treatment planning. The Resequencing Pathogen Microarray (RPM-Flu) is a robust, highly multiplexed and target gene sequencing-based alternative to both traditional culture- or biomarker-based diagnostic tests. RPM-Flu is a single, simultaneous differential diagnostic assay for all subtype combinations of type A influenza viruses and for 30 other viral and bacterial pathogens that may cause influenza-like illness. These other pathogen targets of RPM-Flu may co-infect and compound the morbidity and/or mortality of patients with influenza. The informative specificity of a single RPM-Flu test represents specimen-specific viral gene sequences as determinants of virus type, A/HN subtype, virulence, host-range, and resistance to antiviral agents. PMID:20140251
Metzgar, David; Myers, Christopher A; Russell, Kevin L; Faix, Dennis; Blair, Patrick J; Brown, Jason; Vo, Scott; Swayne, David E; Thomas, Colleen; Stenger, David A; Lin, Baochuan; Malanoski, Anthony P; Wang, Zheng; Blaney, Kate M; Long, Nina C; Schnur, Joel M; Saad, Magdi D; Borsuk, Lisa A; Lichanska, Agnieszka M; Lorence, Matthew C; Weslowski, Brian; Schafer, Klaus O; Tibbetts, Clark
2010-02-03
For more than four decades the cause of most type A influenza virus infections of humans has been attributed to only two viral subtypes, A/H1N1 or A/H3N2. In contrast, avian and other vertebrate species are a reservoir of type A influenza virus genome diversity, hosting strains representing at least 120 of 144 combinations of 16 viral hemagglutinin and 9 viral neuraminidase subtypes. Viral genome segment reassortments and mutations emerging within this reservoir may spawn new influenza virus strains as imminent epidemic or pandemic threats to human health and poultry production. Traditional methods to detect and differentiate influenza virus subtypes are either time-consuming and labor-intensive (culture-based) or remarkably insensitive (antibody-based). Molecular diagnostic assays based upon reverse transcriptase-polymerase chain reaction (RT-PCR) have short assay cycle time, and high analytical sensitivity and specificity. However, none of these diagnostic tests determine viral gene nucleotide sequences to distinguish strains and variants of a detected pathogen from one specimen to the next. Decision-quality, strain- and variant-specific pathogen gene sequence information may be critical for public health, infection control, surveillance, epidemiology, or medical/veterinary treatment planning. The Resequencing Pathogen Microarray (RPM-Flu) is a robust, highly multiplexed and target gene sequencing-based alternative to both traditional culture- or biomarker-based diagnostic tests. RPM-Flu is a single, simultaneous differential diagnostic assay for all subtype combinations of type A influenza viruses and for 30 other viral and bacterial pathogens that may cause influenza-like illness. These other pathogen targets of RPM-Flu may co-infect and compound the morbidity and/or mortality of patients with influenza. The informative specificity of a single RPM-Flu test represents specimen-specific viral gene sequences as determinants of virus type, A/HN subtype, virulence, host-range, and resistance to antiviral agents.
Bletz, Stefan; Janezic, Sandra; Harmsen, Dag; Rupnik, Maja; Mellmann, Alexander
2018-06-01
Clostridium difficile , recently renamed Clostridioides difficile , is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping ( n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange. Copyright © 2018 American Society for Microbiology.
Kim, Byoung-Jun; Kim, Ga-Na; Kim, Bo-Ram; Shim, Tae-Sun; Kook, Yoon-Hoh; Kim, Bum-Joon
2017-01-01
Recent multi locus sequence typing (MLST) and genome based studies indicate that lateral gene transfer (LGT) events in the rpoB gene are prevalent between Mycobacterium abscessus complex strains. To check the prevalence of the M. massiliense strains subject to rpoB LGT (Rec-mas), we applied rpoB typing (711 bp) to 106 Korean strains of M. massiliense infection that had already been identified by hsp65 sequence analysis (603 bp). The analysis indicated 6 smooth strains in M. massiliense Type I (10.0%, 6/60) genotypes but no strains in M. massiliense Type II genotypes (0%, 0/46), showing a discrepancy between the 2 typing methods. Further MLST analysis based on the partial sequencing of seven housekeeping genes, argH, cya, glpK, gnd, murC, pta and purH, as well as erm(41) PCR proved that these 6 Rec-mas strains consisted of two distinct genotypes belonging to M. massiliense and not M. abscessus. The complete rpoB sequencing analysis showed that these 6 Rec-mas strains have an identical hybrid rpoB gene, of which a 478 bp partial rpoB fragment may be laterally transferred from M. abscessus. Notably, five of the 6 Rec-mas strains showed complete identical sequences in a total of nine genes, including the seven MLST genes, hsp65, and rpoB, suggesting their clonal propagation in South Korea. In conclusion, we identified 6 M. massiliense smooth strains of 2 phylogenetically distinct genotypes with a specific hybrid rpoB gene laterally transferred from M. abscessus from Korean patients. Their clinical relevance and bacteriological traits remain to be elucidated.
Kim, Byoung-Jun; Kim, Ga-Na; Kim, Bo-Ram; Shim, Tae-Sun; Kook, Yoon-Hoh
2017-01-01
Recent multi locus sequence typing (MLST) and genome based studies indicate that lateral gene transfer (LGT) events in the rpoB gene are prevalent between Mycobacterium abscessus complex strains. To check the prevalence of the M. massiliense strains subject to rpoB LGT (Rec-mas), we applied rpoB typing (711 bp) to 106 Korean strains of M. massiliense infection that had already been identified by hsp65 sequence analysis (603 bp). The analysis indicated 6 smooth strains in M. massiliense Type I (10.0%, 6/60) genotypes but no strains in M. massiliense Type II genotypes (0%, 0/46), showing a discrepancy between the 2 typing methods. Further MLST analysis based on the partial sequencing of seven housekeeping genes, argH, cya, glpK, gnd, murC, pta and purH, as well as erm(41) PCR proved that these 6 Rec-mas strains consisted of two distinct genotypes belonging to M. massiliense and not M. abscessus. The complete rpoB sequencing analysis showed that these 6 Rec-mas strains have an identical hybrid rpoB gene, of which a 478 bp partial rpoB fragment may be laterally transferred from M. abscessus. Notably, five of the 6 Rec-mas strains showed complete identical sequences in a total of nine genes, including the seven MLST genes, hsp65, and rpoB, suggesting their clonal propagation in South Korea. In conclusion, we identified 6 M. massiliense smooth strains of 2 phylogenetically distinct genotypes with a specific hybrid rpoB gene laterally transferred from M. abscessus from Korean patients. Their clinical relevance and bacteriological traits remain to be elucidated. PMID:28604829
Wang, R F; Cao, W W; Cerniglia, C E
1996-01-01
In order to develop a PCR method to detect Fusobacterium prausnitzii in human feces and to clarify the phylogenetic position of this species, its 16S rRNA gene sequence was determined. The sequence described in this paper is different from the 16S rRNA gene sequence is specific for F. prausnitzii, and the results of this assay confirmed that F. prausnitzii is the most common species in human feces. However, a PCR assay based on the original GenBank sequence was negative when it was performed with two strains of F. prausnitzii obtained from the American Type Culture Collection. A phylogenetic tree based on the new 16S rRNA gene sequence was constructed. On this tree F. prausnitzii was not a member of the Fusobacterium group but was closer to some Eubacterium spp. and located between Clostridium "clusters III and IV" (M.D. Collins, P.A. Lawson, A. Willems, J.J. Cordoba, J. Fernandez-Garayzabal, P. Garcia, J. Cai, H. Hippe, and J.A.E. Farrow, Int. J. Syst. Bacteriol. 44:812-826, 1994).
Tracking the Invasion of Small Numbers of Cells in Paper-Based Assays with Quantitative PCR.
Truong, Andrew S; Lochbaum, Christian A; Boyce, Matthew W; Lockett, Matthew R
2015-11-17
Paper-based scaffolds are an attractive material for culturing mammalian cells in a three-dimensional environment. There are a number of previously published studies, which utilize these scaffolds to generate models of aortic valves, cardiac ischemia and reperfusion, and solid tumors. These models have largely relied on fluorescence imaging and microscopy to quantify cells in the scaffolds. We present here a polymerase chain reaction (PCR)-based method, capable of quantifying multiple cell types in a single culture with the aid of DNA barcodes: unique sequences of DNA introduced to the genome of individual cells or cell types through lentiviral transduction. PCR-based methods are highly specific and are amenable to high-throughput and multiplexed analyses. To validate this method, we engineered two different breast cancer lines to constitutively express either a green or red fluorescent protein. These cells lines allowed us to directly compare the ability of fluorescence imaging (of the fluorescent proteins) and qPCR (of the unique DNA sequences of the fluorescent proteins) to quantify known numbers of cells in the paper based-scaffolds. We also used both methods to quantify the distribution of these breast cell lines in homotypic and heterotypic invasion assays. In the paper-based invasion assays, a single sheet of paper containing cells suspended in a hydrogel was sandwiched between sheets of paper containing only hydrogel. The stack was incubated, and the cells invaded the adjacent layers. The individual sheets of the invasion assay were then destacked and the number of cells in each layer quantified. Our results show both methods can accurately detect cell populations of greater than 500 cells. The qPCR method can repeatedly and accurately detect as few as 50 cells, allowing small populations of highly invasive cells to be detected and differentiated from other cell types.
Multiplex detection of respiratory pathogens
McBride, Mary [Brentwood, CA; Slezak, Thomas [Livermore, CA; Birch, James M [Albany, CA
2012-07-31
Described are kits and methods useful for detection of respiratory pathogens (influenza A (including subtyping capability for H1, H3, H5 and H7 subtypes) influenza B, parainfluenza (type 2), respiratory syncytial virus, and adenovirus) in a sample. Genomic sequence information from the respiratory pathogens was analyzed to identify signature sequences, e.g., polynucleotide sequences useful for confirming the presence or absence of a pathogen in a sample. Primer and probe sets were designed and optimized for use in a PCR based, multiplexed Luminex assay to successfully identify the presence or absence of pathogens in a sample.
Hamamoto, Kouta; Ueda, Shuhei; Yamamoto, Yoshimasa
2015-01-01
Genotyping and characterization of bacterial isolates are essential steps in the identification and control of antibiotic-resistant bacterial infections. Recently, one novel genotyping method using three genomic guided Escherichia coli markers (GIG-EM), dinG, tonB, and dipeptide permease (DPP), was reported. Because GIG-EM has not been fully evaluated using clinical isolates, we assessed this typing method with 72 E. coli collection of reference (ECOR) environmental E. coli reference strains and 63 E. coli isolates of various genetic backgrounds. In this study, we designated 768 bp of dinG, 745 bp of tonB, and 655 bp of DPP target sequences for use in the typing method. Concatenations of the processed marker sequences were used to draw GIG-EM phylogenetic trees. E. coli isolates with identical sequence types as identified by the conventional multilocus sequence typing (MLST) method were localized to the same branch of the GIG-EM phylogenetic tree. Sixteen clinical E. coli isolates were utilized as test isolates without prior characterization by conventional MLST and phylogenetic grouping before GIG-EM typing. Of these, 14 clinical isolates were assigned to a branch including only isolates of a pandemic clone, E. coli B2-ST131-O25b, and these results were confirmed by conventional typing methods. Our results suggested that the GIG-EM typing method and its application to phylogenetic trees might be useful tools for the molecular characterization and determination of the genetic relationships among E. coli isolates. PMID:25809972
Hamamoto, Kouta; Ueda, Shuhei; Yamamoto, Yoshimasa; Hirai, Itaru
2015-06-01
Genotyping and characterization of bacterial isolates are essential steps in the identification and control of antibiotic-resistant bacterial infections. Recently, one novel genotyping method using three genomic guided Escherichia coli markers (GIG-EM), dinG, tonB, and dipeptide permease (DPP), was reported. Because GIG-EM has not been fully evaluated using clinical isolates, we assessed this typing method with 72 E. coli collection of reference (ECOR) environmental E. coli reference strains and 63 E. coli isolates of various genetic backgrounds. In this study, we designated 768 bp of dinG, 745 bp of tonB, and 655 bp of DPP target sequences for use in the typing method. Concatenations of the processed marker sequences were used to draw GIG-EM phylogenetic trees. E. coli isolates with identical sequence types as identified by the conventional multilocus sequence typing (MLST) method were localized to the same branch of the GIG-EM phylogenetic tree. Sixteen clinical E. coli isolates were utilized as test isolates without prior characterization by conventional MLST and phylogenetic grouping before GIG-EM typing. Of these, 14 clinical isolates were assigned to a branch including only isolates of a pandemic clone, E. coli B2-ST131-O25b, and these results were confirmed by conventional typing methods. Our results suggested that the GIG-EM typing method and its application to phylogenetic trees might be useful tools for the molecular characterization and determination of the genetic relationships among E. coli isolates. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand
2016-01-01
The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499
Intact long-type dupA as a marker for gastroduodenal diseases in Okinawan subpopulation, Japan
Takahashi, Ayaka; Shiota, Seiji; Matsunari, Osamu; Watada, Masahide; Suzuki, Rumiko; Nakachi, Saori; Kinjo, Nagisa; Kinjo, Fukunori; Yamaoka, Yoshio
2012-01-01
Background Helicobacter pylori dupA can be divided into two types according to the presence or absence of the mutation. In addition, full-sequenced data revealed that dupA has two types with different lengths depend on the presence of approximately 600 bp in the putative 5' region (presence; long-type and absence; short-type), which has not been taken into account in previous studies. Methods A total of 319 strains isolated from Okinawa, the south islands of Japan, were included. The status of dupA and cagA was determined by polymerase chain reaction. The presence of mutations in long-type dupA was determined by DNA sequencing. Results The prevalence of long-type dupA was 26.3% (84/319). Sequence analysis showed that there were only 6 cases (7.1%) with point mutations lead to stop codon among 84 long-type dupA strains studied. Interestingly, intact long-type dupA without frameshift mutation, but not short-type dupA was significantly associated with gastric ulcer and gastric cancer than gastritis (P = 0.001 and P = 0.019, respectively). After adjustment by age, gender and cagA, the presence of intact long-type dupA was significantly associated with gastric ulcer and gastric cancer compared with gastritis (odds ratio [OR] = 3.35, 95% confidence interval [CI] = 1.55–7.24 and OR = 4.14, 95% CI = 1.23–13.94, respectively). Conclusions Intact long-type dupA is a real virulence marker for severe outcomes in Okinawa, Japan. The previous information gained from PCR-based methods without taking long-type dupA into account must be interpreted with caution. PMID:23067336
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains
Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz
2016-01-01
With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Gladka, Monika M; Molenaar, Bas; de Ruiter, Hesther; van der Elst, Stefan; Tsui, Hoyee; Versteeg, Danielle; Lacraz, Grègory P A; Huibers, Manon M H; van Oudenaarden, Alexander; van Rooij, Eva
2018-01-31
Background -Genome-wide transcriptome analysis has greatly advanced our understanding of the regulatory networks underlying basic cardiac biology and mechanisms driving disease. However, so far, the resolution of studying gene expression patterns in the adult heart has been limited to the level of extracts from whole tissues. The use of tissue homogenates inherently causes the loss of any information on cellular origin or cell type-specific changes in gene expression. Recent developments in RNA amplification strategies provide a unique opportunity to use small amounts of input RNA for genome-wide sequencing of single cells. Methods -Here, we present a method to obtain high quality RNA from digested cardiac tissue from adult mice for automated single-cell sequencing of both the healthy and diseased heart. Results -After optimization, we were able to perform single-cell sequencing on adult cardiac tissue under both homeostatic conditions and after ischemic injury. Clustering analysis based on differential gene expression unveiled known and novel markers of all main cardiac cell types. Based on differential gene expression we were also able to identify multiple subpopulations within a certain cell type. Furthermore, applying single-cell sequencing on both the healthy and the injured heart indicated the presence of disease-specific cell subpopulations. As such, we identified cytoskeleton associated protein 4 ( Ckap4 ) as a novel marker for activated fibroblasts that positively correlates with known myofibroblast markers in both mouse and human cardiac tissue. Ckap4 inhibition in activated fibroblasts treated with TGFβ triggered a greater increase in the expression of genes related to activated fibroblasts compared to control, suggesting a role of Ckap4 in modulating fibroblast activation in the injured heart. Conclusions -Single-cell sequencing on both the healthy and diseased adult heart allows us to study transcriptomic differences between cardiac cells, as well as cell type-specific changes in gene expression during cardiac disease. This new approach provides a wealth of novel insights into molecular changes that underlie the cellular processes relevant for cardiac biology and pathophysiology. Applying this technology could lead to the discovery of new therapeutic targets relevant for heart disease.
Advances in molecular serotyping and subtyping of Escherichia coli
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fratamico, Pina M.; DebRoy, Chitrita; Liu, Yanhong
Escherichia coli plays an important role as a member of the gut microbiota; however, pathogenic strains also exist, including various diarrheagenic E. coli pathotypes and extraintestinal pathogenic E. coli that cause illness outside of the GI-tract. E. coli have traditionally been serotyped using antisera against the ca. 186 O-antigens and 53 H-flagellar antigens. Phenotypic methods, including bacteriophage typing and O- and H- serotyping for differentiating and characterizing E. coli have been used for many years; however, these methods are generally time consuming and not always accurate. Advances in next generation sequencing technologies have made it possible to develop genetic-based subtypingmore » and molecular serotyping methods for E. coli, which are more discriminatory compared to phenotypic typing methods. Furthermore, whole genome sequencing (WGS) of E. coli is replacing established subtyping methods such as pulsedfield gel electrophoresis, providing a major advancement in the ability to investigate food-borne disease outbreaks and for trace-back to sources. Furthermore, a variety of sequence analysis tools and bioinformatic pipelines are being developed to analyze the vast amount of data generated by WGS and to obtain specific information such as O- and H-group determination and the presence of virulence genes and other genetic markers.« less
Advances in molecular serotyping and subtyping of Escherichia coli
Fratamico, Pina M.; DebRoy, Chitrita; Liu, Yanhong; ...
2016-05-03
Escherichia coli plays an important role as a member of the gut microbiota; however, pathogenic strains also exist, including various diarrheagenic E. coli pathotypes and extraintestinal pathogenic E. coli that cause illness outside of the GI-tract. E. coli have traditionally been serotyped using antisera against the ca. 186 O-antigens and 53 H-flagellar antigens. Phenotypic methods, including bacteriophage typing and O- and H- serotyping for differentiating and characterizing E. coli have been used for many years; however, these methods are generally time consuming and not always accurate. Advances in next generation sequencing technologies have made it possible to develop genetic-based subtypingmore » and molecular serotyping methods for E. coli, which are more discriminatory compared to phenotypic typing methods. Furthermore, whole genome sequencing (WGS) of E. coli is replacing established subtyping methods such as pulsedfield gel electrophoresis, providing a major advancement in the ability to investigate food-borne disease outbreaks and for trace-back to sources. Furthermore, a variety of sequence analysis tools and bioinformatic pipelines are being developed to analyze the vast amount of data generated by WGS and to obtain specific information such as O- and H-group determination and the presence of virulence genes and other genetic markers.« less
Labeled nucleotide phosphate (NP) probes
Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY
2009-02-03
The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Predicting the host of influenza viruses based on the word vector.
Xu, Beibei; Tan, Zhiying; Li, Kenli; Jiang, Taijiao; Peng, Yousong
2017-01-01
Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.
de Gier, Camilla; Kirkham, Lea-Ann S.
2015-01-01
Nonhemolytic variants of Haemophilus haemolyticus are difficult to differentiate from Haemophilus influenzae despite a wide difference in pathogenic potential. A previous investigation characterized a challenging set of 60 clinical strains using multiple PCRs for marker genes and described strains that could not be unequivocally identified as either species. We have analyzed the same set of strains by multilocus sequence analysis (MLSA) and near-full-length 16S rRNA gene sequencing. MLSA unambiguously allocated all study strains to either of the two species, while identification by 16S rRNA sequence was inconclusive for three strains. Notably, the two methods yielded conflicting identifications for two strains. Most of the “fuzzy species” strains were identified as H. influenzae that had undergone complete deletion of the fucose operon. Such strains, which are untypeable by the H. influenzae multilocus sequence type (MLST) scheme, have sporadically been reported and predominantly belong to a single branch of H. influenzae MLSA phylogenetic group II. We also found evidence of interspecies recombination between H. influenzae and H. haemolyticus within the 16S rRNA genes. Establishing an accurate method for rapid and inexpensive identification of H. influenzae is important for disease surveillance and treatment. PMID:26378279
Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622
Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.
Snelling, A M; Gerner-Smidt, P; Hawkey, P M; Heritage, J; Parnell, P; Porter, C; Bodenham, A R; Inglis, T
1996-01-01
Acinetobacter spp. are being reported with increasing frequency as causes of nosocomial infection. In order to identify reservoirs of infection as quickly as possible, a rapid typing method that can differentiate epidemic strains from environmental and nonepidemic strains is needed. In 1993, a cluster of Acinetobacter baumannii isolates from five patients in the adult intensive therapy unit of our tertiary-care teaching hospital led us to develop and optimize a rapid repetitive extragenic palindromic sequence-based PCR (REP-PCR) typing protocol for members of the Acinetobacter calcoaceticus-A. baumannii complex that uses boiled colonies and consensus primers aimed at repetitive extragenic palindromic sequences. Four of the five patient isolates gave the same REP-PCR typing pattern as isolates of A. baumannii obtained from the temperature probe of a Bennett humidifier; the fifth isolate had a unique profile. Disinfection of the probe with 70% ethanol, as recommended by the manufacturer, proved ineffective, as A. baumannii with the same REP-PCR pattern was isolated from it 10 days after cleaning, necessitating a change in our decontamination procedure. Results obtained with REP-PCR were subsequently confirmed by ribotyping. To evaluate the discriminatory power (D) of REP-PCR for typing members of the A. calcoaceticus-A. baumannii complex, compared with that of ribotyping, we have applied both methods to a collection of 85 strains that included representatives of six DNA groups within the complex. Ribotyping using EcoRI digests yielded 53 patterns (D = 0.98), whereas 68 different REP-PCR patterns were observed (D = 0.99). By computer-assisted analysis of gel images, 74 patterns were observed with REP-PCR (D = 1.0). Overall, REP-PCR typing proved to be slightly more discriminatory than ribotyping. Our results indicate that REP-PCR typing used boiled colonies is a simple, rapid, and effective means of typing members of the A. calcoaceticus-A. baumannii complex. PMID:8727902
Scheirlinck, Ilse; Van der Meulen, Roel; Van Schoor, Ann; Vancanneyt, Marc; De Vuyst, Luc; Vandamme, Peter; Huys, Geert
2007-01-01
A culture-based approach was used to investigate the diversity of lactic acid bacteria (LAB) in Belgian traditional sourdoughs and to assess the influence of flour type, bakery environment, geographical origin, and technological characteristics on the taxonomic composition of these LAB communities. For this purpose, a total of 714 LAB from 21 sourdoughs sampled at 11 artisan bakeries throughout Belgium were subjected to a polyphasic identification approach. The microbial composition of the traditional sourdoughs was characterized by bacteriological culture in combination with genotypic identification methods, including repetitive element sequence-based PCR fingerprinting and phenylalanyl-tRNA synthase (pheS) gene sequence analysis. LAB from Belgian sourdoughs belonged to the genera Lactobacillus, Pediococcus, Leuconostoc, Weissella, and Enterococcus, with the heterofermentative species Lactobacillus paralimentarius, Lactobacillus sanfranciscensis, Lactobacillus plantarum, and Lactobacillus pontis as the most frequently isolated taxa. Statistical analysis of the identification data indicated that the microbial composition of the sourdoughs is mainly affected by the bakery environment rather than the flour type (wheat, rye, spelt, or a mixture of these) used. In conclusion, the polyphasic approach, based on rapid genotypic screening and high-resolution, sequence-dependent identification, proved to be a powerful tool for studying the LAB diversity in traditional fermented foods such as sourdough. PMID:17675431
Scheirlinck, Ilse; Van der Meulen, Roel; Van Schoor, Ann; Vancanneyt, Marc; De Vuyst, Luc; Vandamme, Peter; Huys, Geert
2007-10-01
A culture-based approach was used to investigate the diversity of lactic acid bacteria (LAB) in Belgian traditional sourdoughs and to assess the influence of flour type, bakery environment, geographical origin, and technological characteristics on the taxonomic composition of these LAB communities. For this purpose, a total of 714 LAB from 21 sourdoughs sampled at 11 artisan bakeries throughout Belgium were subjected to a polyphasic identification approach. The microbial composition of the traditional sourdoughs was characterized by bacteriological culture in combination with genotypic identification methods, including repetitive element sequence-based PCR fingerprinting and phenylalanyl-tRNA synthase (pheS) gene sequence analysis. LAB from Belgian sourdoughs belonged to the genera Lactobacillus, Pediococcus, Leuconostoc, Weissella, and Enterococcus, with the heterofermentative species Lactobacillus paralimentarius, Lactobacillus sanfranciscensis, Lactobacillus plantarum, and Lactobacillus pontis as the most frequently isolated taxa. Statistical analysis of the identification data indicated that the microbial composition of the sourdoughs is mainly affected by the bakery environment rather than the flour type (wheat, rye, spelt, or a mixture of these) used. In conclusion, the polyphasic approach, based on rapid genotypic screening and high-resolution, sequence-dependent identification, proved to be a powerful tool for studying the LAB diversity in traditional fermented foods such as sourdough.
Genomic signal analysis of pathogen variability
NASA Astrophysics Data System (ADS)
Cristea, Paul Dan
2006-02-01
The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.
Mirhendi, H; Ghiasian, A; Vismer, Hf; Asgary, Mr; Jalalizand, N; Arendrup, Mc; Makimura, K
2010-01-01
Fusarium species are capable of causing a wide range of crop plants infections as well as uncommon human infections. Many species of the genus produce mycotoxins, which are responsible for acute or chronic diseases in animals and humans. Identification of Fusaria to the species level is necessary for biological, epidemiological, pathological, and toxicological purposes. In this study, we undertook a computer-based analysis of ITS1-5.8SrDNA-ITS2 in 192 GenBank sequences from 36 Fusarium species to achieve data for establishing a molecular method for specie-specific identification. Sequence data and 610 restriction enzymes were analyzed for choosing RFLP profiles, and subsequently designed and validated a PCR-restriction enzyme system for identification and typing of species. DNA extracted from 32 reference strains of 16 species were amplified using ITS1 and ITS4 universal primers followed by sequencing and restriction enzyme digestion of PCR products. The following 3 restriction enzymes TasI, ItaI and CfoI provide the best discriminatory power. Using ITS1 and ITS4 primers a product of approximately 550bp was observed for all Fusarium strains, as expected regarding the sequence analyses. After RFLP of the PCR products, some species were definitely identified by the method and some strains had different patterns in same species. Our profile has potential not only for identification of species, but also for genotyping of strains. On the other hand, some Fusarium species were 100% identical in their ITS-5.8SrDNA-ITS2 sequences, therefore differentiation of these species is impossible regarding this target alone. ITS-PCR-RFLP method might be useful for preliminary differentiation and typing of most common Fusarium species.
Automated use of mutagenesis data in structure prediction.
Nanda, Vikas; DeGrado, William F
2005-05-15
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.
Costa-Alcalde, José Javier; Barbeito-Castiñeiras, Gema; González-Alba, José María; Aguilera, Antonio; Galán, Juan Carlos; Pérez-Del-Molino, María Luisa
2018-06-02
The American Thoracic Society and the Infectious Diseases Society of America recommend that clinically significant non-tuberculous mycobacteria (NTM) should be identified to the species level in order to determine their clinical significance. The aim of this study was to evaluate identification of rapidly growing NTM (RGM) isolated from clinical samples by using MALDI-TOF MS and a commercial molecular system. The results were compared with identification using a reference method. We included 46 clinical isolates of RGM and identified them using the commercial molecular system GenoType ® CM/AS (Hain, Lifescience, Germany), MALDI-TOF MS (Bruker) and, as reference method, partial rpoβ gene sequencing followed by BLAST and phylogenetic analysis with the 1093 sequences available in the GeneBank. The degree of agreement between GenoType ® and MALDI-TOF MS and the reference method, partial rpoβ sequencing, was 27/43 (62.8%) and 38/43 cases (88.3%) respectively. For all the samples correctly classified by GenoType ® , we obtained the same result with MALDI-TOF MS (27/27). However, MALDI-TOF MS also correctly identified 68.75% (11/16) of the samples that GenoType ® had misclassified (p=0.005). MALDI-TOF MS classified significantly better than GenoType ® . When a MALDI-TOF MS score >1.85 was achieved, MALDI-TOF MS and partial rpoβ gene sequencing were equivalent. GenoType ® was not able to distinguish between species belonging to the M. fortuitum complex. MALDI-TOF MS methodology is simple, rapid and associated with lower consumable costs than GenoType ® . The partial rpoβ sequencing methods with BLAST and phylogenetic analysis were not able to identify some RGM unequivocally. Therefore, sequencing of additional regions would be indicated in these cases. Copyright © 2018 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Stojowska, Karolina; Krawczyk, Beata
2014-01-01
We have designed a new ddLMS PCR (double digestion Ligation Mediated Suppression PCR) method based on restriction site polymorphism upstream from the specific target sequence for the simultaneous identification and differentiation of bacterial strains. The ddLMS PCR combines a simple PCR used for species or genus identification and the LM PCR strategy for strain differentiation. The bacterial identification is confirmed in the form of the PCR product(s), while the length of the PCR product makes it possible to differentiate between bacterial strains. If there is a single copy of the target sequence within genomic DNA, one specific PCR product is created (simplex ddLMS PCR), whereas for multiple copies of the gene the fingerprinting patterns can be obtained (multiplex ddLMS PCR). The described ddLMS PCR method is designed for rapid and specific strain differentiation in medical and microbiological studies. In comparison to other LM PCR it has substantial advantages: enables specific species' DNA-typing without the need for pure bacterial culture selection, is not sensitive to contamination with other cells or genomic DNA, and gives univocal "band-based" results, which are easy to interpret. The utility of ddLMS PCR was shown for Acinetobacter calcoaceticus-baumannii (Acb) complex, the genetically closely related and phenotypically similar species and also important nosocomial pathogens, for which currently, there are no recommended methods for screening, typing and identification. In this article two models are proposed: 3' recA-ddLMS PCR-MaeII/RsaI for Acb complex interspecific typing and 5' rrn-ddLMS PCR-HindIII/ApaI for Acinetobacter baumannii intraspecific typing. ddLMS PCR allows not only for DNA-typing but also for confirmation of species in one reaction. Also, practical guidelines for designing a diagnostic test based on ddLMS PCR for genotyping different species of bacteria are provided.
DNA extraction for streamlined metagenomics of diverse environmental samples.
Marotz, Clarisse; Amir, Amnon; Humphrey, Greg; Gaffney, James; Gogul, Grant; Knight, Rob
2017-06-01
A major bottleneck for metagenomic sequencing is rapid and efficient DNA extraction. Here, we compare the extraction efficiencies of three magnetic bead-based platforms (KingFisher, epMotion, and Tecan) to a standardized column-based extraction platform across a variety of sample types, including feces, oral, skin, soil, and water. Replicate sample plates were extracted and prepared for 16S rRNA gene amplicon sequencing in parallel to assess extraction bias and DNA quality. The data demonstrate that any effect of extraction method on sequencing results was small compared with the variability across samples; however, the KingFisher platform produced the largest number of high-quality reads in the shortest amount of time. Based on these results, we have identified an extraction pipeline that dramatically reduces sample processing time without sacrificing bacterial taxonomic or abundance information.
Kwon, Hyuck Hoon; Suh, Dae Hun
2016-11-01
Recent progress has steadily reported the existence of the diverse strains of Propionibacterium acnes, and these studies have contributed to the elucidation of their contradictory roles between normal commensals and pathogens. In this review, the authors aimed to provide an update on the recent understanding of research about P. acnes strain diversity and acne, analyzing the potential implications for clinical applications. Before the era of genomic research, P. acnes was known to be distinguished based on serological agglutination tests, cell wall sugar analysis, or fermentation traits. Since the complete genome sequence of P. acnes was first deciphered, genetic studies based on sequence data have expanded with the introduction of more refined and precise DNA-based typing methods, including multilocus sequence typing and metagenomics. These sophisticated techniques have revealed that P. acnes consists of phylogenetically distinct cluster groups with various pathogenic traits, including elicitation of inflammation, protein secretome profile, and unique distribution patterns in various skin loci. In following large-scale studies from patients' acne samples have revealed that specific sequence types are included within the phylogenetic divisions and further suggested that particular P. acnes strains play an etiologic role in acne while others are associated with health, providing a firm platform for evidential-based research into the exact role of this organism in acne. We strongly believe that future research would provide fruitful results in not only clarifying the apparent controversy with respect to roles of P. acnes but also developing therapeutic drugs by pinpointing specific targets of the pathogenic strain only. © 2016 The International Society of Dermatology.
Molecular dynamics study of some non-hydrogen-bonding base pair DNA strands
NASA Astrophysics Data System (ADS)
Tiwari, Rakesh K.; Ojha, Rajendra P.; Tiwari, Gargi; Pandey, Vishnudatt; Mall, Vijaysree
2018-05-01
In order to elucidate the structural activity of hydrophobic modified DNA, the DMMO2-D5SICS, base pair is introduced as a constituent in different set of 12-mer and 14-mer DNA sequences for the molecular dynamics (MD) simulation in explicit water solvent. AMBER 14 force field was employed for each set of duplex during the 200ns production-dynamics simulation in orthogonal-box-water solvent by the Particle-Mesh-Ewald (PME) method in infinite periodic boundary conditions (PBC) to determine conformational parameters of the complex. The force-field parameters of modified base-pair were calculated by Gaussian-code using Hartree-Fock /ab-initio methodology. RMSD Results reveal that the conformation of the duplex is sequence dependent and the binding energy of the complex depends on the position of the modified base-pair in the nucleic acid strand. We found that non-bonding energy had a significant contribution to stabilising such type of duplex in comparison to electrostatic energy. The distortion produced within strands by such type of base-pair was local and destabilised the duplex integrity near to substitution, moreover the binding energy of duplex depends on the position of substitution of hydrophobic base-pair and the DNA sequence and strongly supports the corresponding experimental study.
Chen, Y. C.; Eisner, J. D.; Kattar, M. M.; Rassoulian-Barrett, S. L.; LaFe, K.; Yarfitz, S. L.; Limaye, A. P.; Cookson, B. T.
2000-01-01
Identification of medically relevant yeasts can be time-consuming and inaccurate with current methods. We evaluated PCR-based detection of sequence polymorphisms in the internal transcribed spacer 2 (ITS2) region of the rRNA genes as a means of fungal identification. Clinical isolates (401), reference strains (6), and type strains (27), representing 34 species of yeasts were examined. The length of PCR-amplified ITS2 region DNA was determined with single-base precision in less than 30 min by using automated capillary electrophoresis. Unique, species-specific PCR products ranging from 237 to 429 bp were obtained from 92% of the clinical isolates. The remaining 8%, divided into groups with ITS2 regions which differed by ≤2 bp in mean length, all contained species-specific DNA sequences easily distinguishable by restriction enzyme analysis. These data, and the specificity of length polymorphisms for identifying yeasts, were confirmed by DNA sequence analysis of the ITS2 region from 93 isolates. Phenotypic and ITS2-based identification was concordant for 427 of 434 yeast isolates examined using sequence identity of ≥99%. Seven clinical isolates contained ITS2 sequences that did not agree with their phenotypic identification, and ITS2-based phylogenetic analyses indicate the possibility of new or clinically unusual species in the Rhodotorula and Candida genera. This work establishes an initial database, validated with over 400 clinical isolates, of ITS2 length and sequence polymorphisms for 34 species of yeasts. We conclude that size and restriction analysis of PCR-amplified ITS2 region DNA is a rapid and reliable method to identify clinically significant yeasts, including potentially new or emerging pathogenic species. PMID:10834993
The Cervical Microbiome over 7 Years and a Comparison of Methodologies for Its Characterization
Smith, Benjamin C.; McAndrew, Thomas; Chen, Zigui; Harari, Ariana; Barris, David M.; Viswanathan, Shankar; Rodriguez, Ana Cecilia; Castle, Phillip; Herrero, Rolando; Schiffman, Mark; Burk, Robert D.
2012-01-01
Background The rapidly expanding field of microbiome studies offers investigators a large choice of methods for each step in the process of determining the microorganisms in a sample. The human cervicovaginal microbiome affects female reproductive health, susceptibility to and natural history of many sexually transmitted infections, including human papillomavirus (HPV). At present, long-term behavior of the cervical microbiome in early sexual life is poorly understood. Methods The V6 and V6–V9 regions of the 16S ribosomal RNA gene were amplified from DNA isolated from exfoliated cervical cells. Specimens from 10 women participating in the Natural History Study of HPV in Guanacaste, Costa Rica were sampled successively over a period of 5–7 years. We sequenced amplicons using 3 different platforms (Sanger, Roche 454, and Illumina HiSeq 2000) and analyzed sequences using pipelines based on 3 different classification algorithms (usearch, RDP Classifier, and pplacer). Results Usearch and pplacer provided consistent microbiome classifications for all sequencing methods, whereas RDP Classifier deviated significantly when characterizing Illumina reads. Comparing across sequencing platforms indicated 7%–41% of the reads were reclassified, while comparing across software pipelines reclassified up to 32% of the reads. Variability in classification was shown not to be due to a difference in read lengths. Six cervical microbiome community types were observed and are characterized by a predominance of either G. vaginalis or Lactobacillus spp. Over the 5–7 year period, subjects displayed fluctuation between community types. A PERMANOVA analysis on pairwise Kantorovich-Rubinstein distances between the microbiota of all samples yielded an F-test ratio of 2.86 (p<0.01), indicating a significant difference comparing within and between subjects’ microbiota. Conclusions Amplification and sequencing methods affected the characterization of the microbiome more than classification algorithms. Pplacer and usearch performed consistently with all sequencing methods. The analyses identified 6 community types consistent with those previously reported. The long-term behavior of the cervical microbiome indicated that fluctuations were subject dependent. PMID:22792313
Pourcel, Christine; Minandri, Fabrizia; Hauck, Yolande; D'Arezzo, Silvia; Imperi, Francesco; Vergnaud, Gilles; Visca, Paolo
2011-01-01
Acinetobacter baumannii is an important opportunistic pathogen responsible for nosocomial outbreaks, mostly occurring in intensive care units. Due to the multiplicity of infection sources, reliable molecular fingerprinting techniques are needed to establish epidemiological correlations among A. baumannii isolates. Multiple-locus variable-number tandem-repeat analysis (MLVA) has proven to be a fast, reliable, and cost-effective typing method for several bacterial species. In this study, an MLVA assay compatible with simple PCR- and agarose gel-based electrophoresis steps as well as with high-throughput automated methods was developed for A. baumannii typing. Preliminarily, 10 potential polymorphic variable-number tandem repeats (VNTRs) were identified upon bioinformatic screening of six annotated genome sequences of A. baumannii. A collection of 7 reference strains plus 18 well-characterized isolates, including unique types and representatives of the three international A. baumannii lineages, was then evaluated in a two-center study aimed at validating the MLVA assay and comparing it with other genotyping assays, namely, macrorestriction analysis with pulsed-field gel electrophoresis (PFGE) and PCR-based sequence group (SG) profiling. The results showed that MLVA can discriminate between isolates with identical PFGE types and SG profiles. A panel of eight VNTR markers was selected, all showing the ability to be amplified and good amounts of polymorphism in the majority of strains. Independently generated MLVA profiles, composed of an ordered string of allele numbers corresponding to the number of repeats at each VNTR locus, were concordant between centers. Typeability, reproducibility, stability, discriminatory power, and epidemiological concordance were excellent. A database containing information and MLVA profiles for several A. baumannii strains is available from http://mlva.u-psud.fr/. PMID:21147956
Ogihara, Shinji; Saito, Ryoichi; Sawabe, Etsuko; Kozakai, Takahiro; Shima, Mari; Aiso, Yoshibumi; Fujie, Toshihide; Nukui, Yoko; Koike, Ryuji; Hagihara, Michio; Tohda, Shuji
2018-04-01
The recently developed PCR-based open reading frame typing (POT) method is a useful molecular typing tool. Here, we evaluated the performance of POT for molecular typing of methicillin-resistant Staphylococcus aureus (MRSA) isolates and compared its performance to those of multilocus sequence typing (MLST) and Staphylococcus protein A gene typing (spa typing). Thirty-seven MRSA isolates were collected between July 2012 and May 2015. MLST, spa typing, and POT were performed, and their discriminatory powers were evaluated using Simpson's index analysis. The MRSA isolates were classified into 11, 18, and 33 types by MLST, spa typing, and POT, respectively. The predominant strains identified by MLST, spa typing, and POT were ST8 and ST764, t002, and 93-191-127, respectively. The discriminatory power of MLST, spa typing, and POT was 0.853, 0.875, and 0.992, respectively, indicating that POT had the highest discriminatory power. Moreover, the results of MLST and spa were available after 2 days, whereas that of POT was available in 5 h. Furthermore, POT is rapid and easy to perform and interpret. Therefore, POT is a superior molecular typing tool for monitoring nosocomial transmission of MRSA. Copyright © 2017 Japanese Society of Chemotherapy and The Japanese Association for Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Iraola, G; Betancor, L; Calleros, L; Gadea, P; Algorta, G; Galeano, S; Muxi, P; Greif, G; Pérez, R
2015-08-01
Whole-genome characterisation in clinical microbiology enables to detect trends in infection dynamics and disease transmission. Here, we report a case of bacteraemia due to Campylobacter fetus subsp. fetus in a rural worker under cancer treatment that was diagnosed with cellulitis; the patient was treated with antibiotics and recovered. The routine typing methods were not able to identify the microorganism causing the infection, so it was further analysed by molecular methods and whole-genome sequencing. The multi-locus sequence typing (MLST) revealed the presence of the bovine-associated ST-4 genotype. Whole-genome comparisons with other C. fetus strains revealed an inconsistent phylogenetic position based on the core genome, discordant with previous ST-4 strains. To the best of our knowledge, this is the first C. fetus subsp. fetus carrying the ST-4 isolated from humans and represents a probable case of zoonotic transmission from cattle.
Kim, Eun Hye; Lee, Hwan Young; Yang, In Seok; Jung, Sang-Eun; Yang, Woo Ick; Shin, Kyoung-Jin
2016-05-01
The next-generation sequencing (NGS) method has been utilized to analyze short tandem repeat (STR) markers, which are routinely used for human identification purposes in the forensic field. Some researchers have demonstrated the successful application of the NGS system to STR typing, suggesting that NGS technology may be an alternative or additional method to overcome limitations of capillary electrophoresis (CE)-based STR profiling. However, there has been no available multiplex PCR system that is optimized for NGS analysis of forensic STR markers. Thus, we constructed a multiplex PCR system for the NGS analysis of 18 markers (13CODIS STRs, D2S1338, D19S433, Penta D, Penta E and amelogenin) by designing amplicons in the size range of 77-210 base pairs. Then, PCR products were generated from two single-sources, mixed samples and artificially degraded DNA samples using a multiplex PCR system, and were prepared for sequencing on the MiSeq system through construction of a subsequent barcoded library. By performing NGS and analyzing the data, we confirmed that the resultant STR genotypes were consistent with those of CE-based typing. Moreover, sequence variations were detected in targeted STR regions. Through the use of small-sized amplicons, the developed multiplex PCR system enables researchers to obtain successful STR profiles even from artificially degraded DNA as well as STR loci which are analyzed with large-sized amplicons in the CE-based commercial kits. In addition, successful profiles can be obtained from mixtures up to a 1:19 ratio. Consequently, the developed multiplex PCR system, which produces small size amplicons, can be successfully applied to STR NGS analysis of forensic casework samples such as mixtures and degraded DNA samples. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Laboratory Diagnosis and Susceptibility Testing for Mycobacterium tuberculosis.
Procop, Gary W
2016-12-01
The laboratory, which utilizes some of the most sophisticated and rapidly changing technologies, plays a critical role in the diagnosis of tuberculosis. Some of these tools are being employed in resource-challenged countries for the rapid detection and characterization of Mycobacterium tuberculosis. Foremost, the laboratory defines appropriate specimen criteria for optimal test performance. The direct detection of mycobacteria in the clinical specimen, predominantly done by acid-fast staining, may eventually be replaced by rapid-cycle PCR. The widespread use of the Xpert MTB/RIF (Cepheid) assay, which detects both M. tuberculosis and key genetic determinants of rifampin resistance, is important for the early detection of multidrug-resistant strains. Culture, using both broth and solid media, remains the standard for establishing the laboratory-based diagnosis of tuberculosis. Cultured isolates are identified far less commonly by traditional biochemical profiling and more commonly by molecular methods, such as DNA probes and broad-range PCR with DNA sequencing. Non-nucleic acid-based methods of identification, such as high-performance liquid chromatography and, more recently, matrix-assisted laser desorption/ionization-time of flight mass spectrometry, may also be used for identification. Cultured isolates of M. tuberculosis should be submitted for susceptibility testing according to standard guidelines. The use of broth-based susceptibility testing is recommended to significantly decrease the time to result. Cultured isolates may also be submitted for strain typing for epidemiologic purposes. The use of massive parallel sequencing, also known as next-generation sequencing, promises to continue to this molecular revolution in mycobacteriology, as whole-genome sequencing provides identification, susceptibility, and typing information simultaneously.
Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.
2015-01-01
Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.
Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.
2015-01-01
Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673
Lee, Chao-Hung; Helweg-Larsen, Jannik; Tang, Xing; Jin, Shaoling; Li, Baozheng; Bartlett, Marilyn S.; Lu, Jang-Jih; Lundgren, Bettina; Lundgren, Jens D.; Olsson, Mats; Lucas, Sebastian B.; Roux, Patricia; Cargnel, Antonietta; Atzori, Chiara; Matos, Olga; Smith, James W.
1998-01-01
Pneumocystis carinii f. sp. hominis isolates from 207 clinical specimens from nine countries were typed based on nucleotide sequence variations in the internal transcribed spacer regions I and II (ITS1 and ITS2, respectively) of rRNA genes. The number of ITS1 nucleotides has been revised from the previously reported 157 bp to 161 bp. Likewise, the number of ITS2 nucleotides has been changed from 177 to 192 bp. The number of ITS1 sequence types has increased from 2 to 15, and that of ITS2 has increased from 3 to 14. The 15 ITS1 sequence types are designated types A through O, and the 14 ITS2 types are named types a through n. A total of 59 types of P. carinii f. sp. hominis were found in this study. PMID:9508304
Investigating the long-term course of schizophrenia by sequence analysis.
An der Heiden, Wolfram; Häfner, Heinz
2015-08-30
In the present study we set out to explore the long-term clinical course of schizophrenia in a holistic manner by adopting sequence analysis. Our aim was to identify course types of illness by means of cluster analysis. The study was based on course and outcome data for 107 patients followed up over 134 months after first admission in the ABC Schizophrenia Study. Focusing on the main syndromes (positive, negative, depressive and unspecific symptoms) and their combinations we looked for similarities in individual illness courses using the 'optimal matching' method. A cluster analysis performed on the resulting similarity matrix yielded two main groups (a 'improving' and a 'chronic' group), which comprised a total of six different types of illness course. The course types differed in both quantitative (frequency of syndromes and syndrome combinations) and qualitative terms (clinical presentation, sequence of syndromes). Cluster membership was only rarely, but clearly associated with sociodemographic characteristics, treatment data and other illness variables. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Kaya, Hülya; Hasman, Henrik; Larsen, Jesper; Stegger, Marc; Johannesen, Thor Bech; Allesøe, Rosa Lundbye; Lemvigh, Camilla Koldbæk; Aarestrup, Frank Møller; Lund, Ole; Larsen, Anders Rhod
2018-01-01
Typing of methicillin-resistant Staphylococcus aureus (MRSA) is important in infection control and surveillance. The current nomenclature of MRSA includes the genetic background of the S. aureus strain determined by multilocus sequence typing (MLST) or equivalent methods like spa typing and typing of the mobile genetic element staphylococcal cassette chromosome mec (SCC mec ), which carries the mecA or mecC gene. Whereas MLST and spa typing are relatively simple, typing of SCC mec is less trivial because of its heterogeneity. Whole-genome sequencing (WGS) provides the essential data for typing of the genetic background and SCC mec , but so far, no bioinformatic tools for SCC mec typing have been available. Here, we report the development and evaluation of SCC mec Finder for characterization of the SCC mec element from S. aureus WGS data. SCC mec Finder is able to identify all SCC mec element types, designated I to XIII, with subtyping of SCC mec types IV (2B) and V (5C2). SCC mec elements are characterized by two different gene prediction approaches to achieve correct annotation, a Basic Local Alignment Search Tool (BLAST)-based approach and a k -mer-based approach. Evaluation of SCC mec Finder by using a diverse collection of clinical isolates ( n = 93) showed a high typeability level of 96.7%, which increased to 98.9% upon modification of the default settings. In conclusion, SCC mec Finder can be an alternative to more laborious SCC mec typing methods and is freely available at https://cge.cbs.dtu.dk/services/SCCmecFinder. IMPORTANCE SCC mec in MRSA is acknowledged to be of importance not only because it contains the mecA or mecC gene but also for staphylococcal adaptation to different environments, e.g., in hospitals, the community, and livestock. Typing of SCC mec by PCR techniques has, because of its heterogeneity, been challenging, and whole-genome sequencing has only partially solved this since no good bioinformatic tools have been available. In this article, we describe the development of a new bioinformatic tool, SCC mec Finder, that includes most of the needs for infection control professionals and researchers regarding the interpretation of SCC mec elements. The software detects all of the SCC mec elements accepted by the International Working Group on the Classification of Staphylococcal Cassette Chromosome Elements, and users will be prompted if diverging and potential new elements are uploaded. Furthermore, SCC mec Finder will be curated and updated as new elements are found and it is easy to use and freely accessible.
Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N
2017-06-23
The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass spectrometry methods in line with Codex Alimentarius Commission requirements for foods designed to meet the needs of gluten intolerant individuals. Copyright © 2017. Published by Elsevier B.V.
Zhang, Hongtao; Setubal, Joao Carlos; Zhan, Xiaobei; Zheng, Zhiyong; Yu, Lijun; Wu, Jianrong; Chen, Dingqiang
2011-06-01
Agrobacterium sp. ATCC 31749 (formerly named Alcaligenes faecalis var. myxogenes) is a non-pathogenic aerobic soil bacterium used in large scale biotechnological production of curdlan. However, little is known about its genomic information. DNA partial sequence of electron transport chains (ETCs) protein genes were obtained in order to understand the components of ETC and genomic-specificity in Agrobacterium sp. ATCC 31749. Degenerate primers were designed according to ETC conserved sequences in other reported species. DNA partial sequences of ETC genes in Agrobacterium sp. ATCC 31749 were cloned by the PCR method using degenerate primers. Based on comparative genomic analysis, nine electron transport elements were ascertained, including NADH ubiquinone oxidoreductase, succinate dehydrogenase complex II, complex III, cytochrome c, ubiquinone biosynthesis protein ubiB, cytochrome d terminal oxidase, cytochrome bo terminal oxidase, cytochrome cbb (3)-type terminal oxidase and cytochrome caa (3)-type terminal oxidase. Similarity and phylogenetic analyses of these genes revealed that among fully sequenced Agrobacterium species, Agrobacterium sp. ATCC 31749 is closest to Agrobacterium tumefaciens C58. Based on these results a comprehensive ETC model for Agrobacterium sp. ATCC 31749 is proposed.
Galaxy morphology - An unsupervised machine learning approach
NASA Astrophysics Data System (ADS)
Schutter, A.; Shamir, L.
2015-09-01
Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.
Keith. Boggs
2000-01-01
A classification of community types, successional sequences, and landscapes is presented for the piedmont of the Copper River Delta. The classification was based on a sampling of 471 sites. A total of 75 community types, 42 successional sequences, and 6 landscapes are described. The classification of community types reflects the existing vegetation communities on the...
Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando
2014-01-01
Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143
Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando
2014-12-01
Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.
Tu, Bin; Masaberg, Carly; Hou, Lihua; Behm, Daniel; Brescia, Peter; Cha, Nuri; Kariyawasam, Kanthi; Lee, Jar How; Nong, Thoa; Sells, John; Tausch, Paul; Yang, Ruyan; Ng, Jennifer; Hurley, Carolyn Katovich
2017-02-01
Sanger-based DNA sequencing of exons 2+3 of HLA class I alleles from a heterozygote frequently results in two or more alternative genotypes. This study was undertaken to reduce the time and effort required to produce a single high resolution HLA genotype. Samples were typed in parallel by Sanger sequencing and oligonucleotide probe hybridization. This workflow, together with optimization of analysis software, was tested and refined during the typing of over 42,000 volunteers for an unrelated hematopoietic progenitor cell donor registry. Next generation DNA sequencing (NGS) was applied to over 1000 of these samples to identify the alleles present within the G group designations. Single genotypes at G level resolution were obtained for over 95% of the loci without additional assays. The vast majority of alleles identified (>99%) were the primary allele giving the G groups their name. Only 0.7% of the alleles identified encoded protein variants that were not detected by a focus on the antigen recognition domain (ARD)-encoding exons. Our combined method routinely provides biologically relevant typing resolution at the level of the ARD. It can be applied to both single samples or to large volume typing supporting either bone marrow or solid organ transplantation using technologies currently available in many HLA laboratories. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Long-range barcode labeling-sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Feng; Zhang, Tao; Singh, Kanwar K.
Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.
Multistage Spectral Relaxation Method for Solving the Hyperchaotic Complex Systems
Saberi Nik, Hassan; Rebelo, Paulo
2014-01-01
We present a pseudospectral method application for solving the hyperchaotic complex systems. The proposed method, called the multistage spectral relaxation method (MSRM) is based on a technique of extending Gauss-Seidel type relaxation ideas to systems of nonlinear differential equations and using the Chebyshev pseudospectral methods to solve the resulting system on a sequence of multiple intervals. In this new application, the MSRM is used to solve famous hyperchaotic complex systems such as hyperchaotic complex Lorenz system and the complex permanent magnet synchronous motor. We compare this approach to the Runge-Kutta based ode45 solver to show that the MSRM gives accurate results. PMID:25386624
Dutta, Sanjib; Koide, Akiko; Koide, Shohei
2008-01-01
Stability evaluation of many mutants can lead to a better understanding of the sequence determinants of a structural motif and of factors governing protein stability and protein evolution. The traditional biophysical analysis of protein stability is low throughput, limiting our ability to widely explore the sequence space in a quantitative manner. In this study, we have developed a high-throughput library screening method for quantifying stability changes, which is based on protein fragment reconstitution and yeast surface display. Our method exploits the thermodynamic linkage between protein stability and fragment reconstitution and the ability of the yeast surface display technique to quantitatively evaluate protein-protein interactions. The method was applied to a fibronectin type III (FN3) domain. Characterization of fragment reconstitution was facilitated by the co-expression of two FN3 fragments, thus establishing a "yeast surface two-hybrid" method. Importantly, our method does not rely on competition between clones and thus eliminates a common limitation of high-throughput selection methods in which the most stable variants are predominantly recovered. Thus, it allows for the isolation of sequences that exhibits a desired level of stability. We identified over one hundred unique sequences for a β-bulge motif, which was significantly more informative than natural sequences of the FN3 family in revealing the sequence determinants for the β-bulge. Our method provides a powerful means to rapidly assess stability of many variants, to systematically assess contribution of different factors to protein stability and to enhance protein stability. PMID:18674545
EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.
Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan
2018-01-01
Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
Potapov, Vladimir; Ong, Jennifer L; Langhorst, Bradley W; Bilotti, Katharina; Cahoon, Dan; Canton, Barry; Knight, Thomas F; Evans, Thomas C; Lohman, Gregory Js
2018-05-08
DNA ligases are key enzymes in molecular and synthetic biology that catalyze the joining of breaks in duplex DNA and the end-joining of DNA fragments. Ligation fidelity (discrimination against the ligation of substrates containing mismatched base pairs) and bias (preferential ligation of particular sequences over others) have been well-studied in the context of nick ligation. However, almost no data exist for fidelity and bias in end-joining ligation contexts. In this study, we applied Pacific Biosciences Single-Molecule Real-Time sequencing technology to directly sequence the products of a highly multiplexed ligation reaction. This method has been used to profile the ligation of all three-base 5'-overhangs by T4 DNA ligase under typical ligation conditions in a single experiment. We report the relative frequency of all ligation products with or without mismatches, the position-dependent frequency of each mismatch, and the surprising observation that 5'-TNA overhangs ligate extremely inefficiently compared to all other Watson-Crick pairings. The method can easily be extended to profile other ligases, end-types (e.g. blunt ends and overhangs of different lengths), and the effect of adjacent sequence on the ligation results. Further, the method has the potential to provide new insights into the thermodynamics of annealing and the kinetics of end-joining reactions.
Ivy, Reid A; Farber, Jeffrey M; Pagotto, Franco; Wiedmann, Martin
2013-01-01
Foodborne pathogen isolate collections are important for the development of detection methods, for validation of intervention strategies, and to develop an understanding of pathogenesis and virulence. We have assembled a publicly available Cronobacter (formerly Enterobacter sakazakii) isolate set that consists of (i) 25 Cronobacter sakazakii isolates, (ii) two Cronobacter malonaticus isolates, (iii) one Cronobacter muytjensii isolate, which displays some atypical phenotypic characteristics, biochemical profiles, and colony color on selected differential media, and (iv) two nonclinical Enterobacter asburiae isolates, which show some phenotypic characteristics similar to those of Cronobacter spp. The set consists of human (n = 10), food (n = 11), and environmental (n = 9) isolates. Analysis of partial 16S rDNA sequence and seven-gene multilocus sequence typing data allowed for reliable identification of these isolates to species and identification of 14 isolates as sequence type 4, which had previously been shown to be the most common C. sakazakii sequence type associated with neonatal meningitis. Phenotypic characterization was carried out with API 20E and API 32E test strips and streaking on two selective chromogenic agars; isolates were also assessed for sorbitol fermentation and growth at 45°C. Although these strategies typically produced the same classification as sequence-based strategies, based on a panel of four biochemical tests, one C. sakazakii isolate yielded inconclusive data and one was classified as C. malonaticus. EcoRI automated ribotyping and pulsed-field gel electrophoresis (PFGE) with XbaI separated the set into 23 unique ribotypes and 30 unique PFGE types, respectively, indicating subtype diversity within the set. Subtype and source data for the collection are publicly available in the PathogenTracker database (www. pathogentracker. net), which allows for continuous updating of information on the set, including links to publications that include information on isolates from this collection.
Yan, Song; Li, Yun
2014-02-15
Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq
Building Facade Modeling Under Line Feature Constraint Based on Close-Range Images
NASA Astrophysics Data System (ADS)
Liang, Y.; Sheng, Y. H.
2018-04-01
To solve existing problems in modeling facade of building merely with point feature based on close-range images , a new method for modeling building facade under line feature constraint is proposed in this paper. Firstly, Camera parameters and sparse spatial point clouds data were restored using the SFM , and 3D dense point clouds were generated with MVS; Secondly, the line features were detected based on the gradient direction , those detected line features were fit considering directions and lengths , then line features were matched under multiple types of constraints and extracted from multi-image sequence. At last, final facade mesh of a building was triangulated with point cloud and line features. The experiment shows that this method can effectively reconstruct the geometric facade of buildings using the advantages of combining point and line features of the close - range image sequence, especially in restoring the contour information of the facade of buildings.
Pathway analysis with next-generation sequencing data.
Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao
2015-04-01
Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
Rapid Multi-Locus Sequence Typing Using Microfluidic Biochips
2010-05-12
Sequence Types. The evolutionary history of all the B. cereus MLST concatenated Sequence Types (545 taxa, 2,394 nucleotide positions) was inferred using...the Neighbor-Joining method [28]. The bootstrap consensus tree inferred from 100 replicates was taken to represent the evolutionary history of the... Chlamydia (manuscript in preparation) and performed pilot studies on Staphylococcus aureus and Streptoccus pneumoniae (Data S4 and Text S2). Another potential
The effects of metal ions on the DNA damage induced by hydrogen peroxide.
Kobayashi, S; Ueda, K; Komano, T
1990-01-01
The effects of metal ions on DNA damage induced by hydrogen peroxide were investigated using two methods, agarose-gel electrophoretic analysis of supercoiled DNA and sequencing-gel analysis of single end-labeled DNA fragments of defined sequences. Hydrogen peroxide induced DNA damage when iron or copper ion was present. At least two classes of DNA damage were induced, one being direct DNA-strand cleavage, and the other being base modification labile to hot piperidine. The investigation of the damaged sites and the inhibitory effects of radical scavengers revealed that hydroxyl radical was the species which attacked DNA in the reaction of H2O2/Fe(II). On the other hand, two types of DNA damage were induced by H2O2/Cu(II). Type I damage was predominant and inhibited by potassium iodide, but type II was not. The sites of the base-modification induced by type I damage were similar to those by lipid peroxidation products and by ascorbate in the presence of Cu(II), suggesting the involvement of radical species other than free hydroxyl radical in the damaging reactions.
Nucleic acid analysis using terminal-phosphate-labeled nucleotides
Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY
2008-04-22
The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Armour, John A. L.; Palla, Raquel; Zeeuwen, Patrick L. J. M.; den Heijer, Martin; Schalkwijk, Joost; Hollox, Edward J.
2007-01-01
Recent work has demonstrated an unexpected prevalence of copy number variation in the human genome, and has highlighted the part this variation may play in predisposition to common phenotypes. Some important genes vary in number over a high range (e.g. DEFB4, which commonly varies between two and seven copies), and have posed formidable technical challenges for accurate copy number typing, so that there are no simple, cheap, high-throughput approaches suitable for large-scale screening. We have developed a simple comparative PCR method based on dispersed repeat sequences, using a single pair of precisely designed primers to amplify products simultaneously from both test and reference loci, which are subsequently distinguished and quantified via internal sequence differences. We have validated the method for the measurement of copy number at DEFB4 by comparison of results from >800 DNA samples with copy number measurements by MAPH/REDVR, MLPA and array-CGH. The new Paralogue Ratio Test (PRT) method can require as little as 10 ng genomic DNA, appears to be comparable in accuracy to the other methods, and for the first time provides a rapid, simple and inexpensive method for copy number analysis, suitable for application to typing thousands of samples in large case-control association studies. PMID:17175532
Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.
Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich
2012-02-01
The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.
Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes
Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich
2012-01-01
The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information. PMID:22384404
Zhang, Hua; Zhang, Tuo; Gao, Jianzhao; Ruan, Jishou; Shen, Shiyi; Kurgan, Lukasz
2012-01-01
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Y.; Zhang, H.; Madrid, R.
1994-09-01
Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses ismore » to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.« less
de Knegt, Leonardo V; Pires, Sara M; Löfström, Charlotta; Sørensen, Gitte; Pedersen, Karl; Torpdahl, Mia; Nielsen, Eva M; Hald, Tine
2016-03-01
Salmonella is an important cause of bacterial foodborne infections in Denmark. To identify the main animal-food sources of human salmonellosis, risk managers have relied on a routine application of a microbial subtyping-based source attribution model since 1995. In 2013, multiple locus variable number tandem repeat analysis (MLVA) substituted phage typing as the subtyping method for surveillance of S. Enteritidis and S. Typhimurium isolated from animals, food, and humans in Denmark. The purpose of this study was to develop a modeling approach applying a combination of serovars, MLVA types, and antibiotic resistance profiles for the Salmonella source attribution, and assess the utility of the results for the food safety decisionmakers. Full and simplified MLVA schemes from surveillance data were tested, and model fit and consistency of results were assessed using statistical measures. We conclude that loci schemes STTR5/STTR10/STTR3 for S. Typhimurium and SE9/SE5/SE2/SE1/SE3 for S. Enteritidis can be used in microbial subtyping-based source attribution models. Based on the results, we discuss that an adjustment of the discriminatory level of the subtyping method applied often will be required to fit the purpose of the study and the available data. The issues discussed are also considered highly relevant when applying, e.g., extended multi-locus sequence typing or next-generation sequencing techniques. © 2015 Society for Risk Analysis.
Wan, Haisu; Li, Yongwen; Fan, Yu; Meng, Fanrong; Chen, Chen; Zhou, Qinghua
2012-01-15
Site-directed mutagenesis has become routine in molecular biology. However, many mutants can still be very difficult to create. Complicated chimerical mutations, tandem repeats, inverted sequences, GC-rich regions, and/or heavy secondary structures can cause inefficient or incorrect binding of the mutagenic primer to the target sequence and affect the subsequent amplification. In theory, these problems can be avoided by introducing the mutations into the target sequence using mutagenic fragments and so removing the need for primer-template annealing. The cassette mutagenesis uses the mutagenic fragment in its protocol; however, in most cases it needs to perform two rounds of mutagenic primer-based mutagenesis to introduce suitable restriction enzyme sites into templates and is not suitable for routine mutagenesis. Here we describe a highly efficient method in which the template except the region to be mutated is amplified by polymerase chain reaction (PCR) and the type IIs restriction enzyme-digested PCR product is directly ligated with the mutagenic fragment. Our method requires no assistance of mutagenic primers. We have used this method to create various types of difficult-to-make mutants with mutagenic frequencies of nearly 100%. Our protocol has many advantages over the prevalent QuikChange method and is a valuable tool for studies on gene structure and function. Copyright © 2011 Elsevier Inc. All rights reserved.
Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.
2011-01-01
High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204
Petzold, Markus; Ehricht, Ralf; Slickers, Peter; Pleischl, Stefan; Brockmann, Ansgar; Exner, Martin; Monecke, Stefan; Lück, Christian
2017-06-01
Between 1 August and 6 September 2013, an outbreak of Legionnaires' disease (LD) with 78 cases confirmed by positive urinary antigen tests occurred in Warstein, North Rhine-Westphalia, Germany. Legionella (L.) pneumophila, serogroup (Sg) 1, monoclonal antibody (mAb) subgroup Knoxville, sequence type (ST) 345, was identified as the epidemic strain. This strain was isolated from seven patients. To detect the source of the infection, epidemiological typing of clinical and environmental strains was performed in two consecutive steps. First, strains were typed by monoclonal antibodies. Indistinguishable strains were further subtyped by sequence-based typing (SBT) which is the internationally recognized standard method for epidemiological genotyping of L. pneumophila. In an early stage of the outbreak investigation, many environmental isolates were found to belong to the mAb subgroup Knoxville, but to two different STs, namely to ST 345, the epidemic strain, and to ST 600. A majority of environmental isolates belonged to ST 600 whereas the epidemic ST 345 strain was less common in environmental samples. To rapidly distinguish both Knoxville strains, we applied a novel typing method based on DNA-hybridization on glass chips. The new assay can easily and rapidly discriminate L. pneumophila Sg 1 strains. Thus, we were able to quickly identify the sources harboring the epidemic strain, i.e., two cooling towers of different companies, the waste water treatment plants (WWTP) of the city and one company as well as water samples of the river Wester and its branches. Copyright © 2016 Elsevier GmbH. All rights reserved.
Gerstner, Arpad; DeFord, James H; Papaconstantinou, John
2003-07-25
Ames dwarfism is caused by a homozygous single nucleotide mutation in the pituitary specific prop-1 gene, resulting in combined pituitary hormone deficiency, reduced growth and extended lifespan. Thus, these mice serve as an important model system for endocrinological, aging and longevity studies. Because the phenotype of wild type and heterozygous mice is undistinguishable, it is imperative for successful breeding to accurately genotype these animals. Here we report a novel, yet simple, approach for prop-1 genotyping using PCR-based allele-specific amplification (PCR-ASA). We also compare this method to other potential genotyping techniques, i.e. PCR-based restriction fragment length polymorphism analysis (PCR-RFLP) and fluorescence automated DNA sequencing. We demonstrate that the single-step PCR-ASA has several advantages over the classical PCR-RFLP because the procedure is simple, less expensive and rapid. To further increase the specificity and sensitivity of the PCR-ASA, we introduced a single-base mismatch at the 3' penultimate position of the mutant primer. Our results also reveal that the fluorescence automated DNA sequencing has limitations for detecting a single nucleotide polymorphism in the prop-1 gene, particularly in heterozygotes.
Dumonceaux, Tim J.; Green, Margaret; Hammond, Christine; Perez, Edel; Olivier, Chrystel
2014-01-01
Phytoplasmas (‘Candidatus Phytoplasma’ spp.) are insect-vectored bacteria that infect a wide variety of plants, including many agriculturally important species. The infections can cause devastating yield losses by inducing morphological changes that dramatically alter inflorescence development. Detection of phytoplasma infection typically utilizes sequences located within the 16S–23S rRNA-encoding locus, and these sequences are necessary for strain identification by currently accepted standards for phytoplasma classification. However, these methods can generate PCR products >1400 bp that are less divergent in sequence than protein-encoding genes, limiting strain resolution in certain cases. We describe a method for accessing the chaperonin-60 (cpn60) gene sequence from a diverse array of ‘Ca.Phytoplasma’ spp. Two degenerate primer sets were designed based on the known sequence diversity of cpn60 from ‘Ca.Phytoplasma’ spp. and used to amplify cpn60 gene fragments from various reference samples and infected plant tissues. Forty three cpn60 sequences were thereby determined. The cpn60 PCR-gel electrophoresis method was highly sensitive compared to 16S-23S-targeted PCR-gel electrophoresis. The topology of a phylogenetic tree generated using cpn60 sequences was congruent with that reported for 16S rRNA-encoding genes. The cpn60 sequences were used to design a hybridization array using oligonucleotide-coupled fluorescent microspheres, providing rapid diagnosis and typing of phytoplasma infections. The oligonucleotide-coupled fluorescent microsphere assay revealed samples that were infected simultaneously with two subtypes of phytoplasma. These tools were applied to show that two host plants, Brassica napus and Camelina sativa, displayed different phytoplasma infection patterns. PMID:25551224
Molecular epidemiology of mastitis pathogens of dairy cattle and comparative relevance to humans.
Zadoks, Ruth N; Middleton, John R; McDougall, Scott; Katholm, Jorgen; Schukken, Ynte H
2011-12-01
Mastitis, inflammation of the mammary gland, can be caused by a wide range of organisms, including gram-negative and gram-positive bacteria, mycoplasmas and algae. Many microbial species that are common causes of bovine mastitis, such as Escherichia coli, Klebsiella pneumoniae, Streptococcus agalactiae and Staphylococcus aureus also occur as commensals or pathogens of humans whereas other causative species, such as Streptococcus uberis, Streptococcus dysgalactiae subsp. dysgalactiae or Staphylococcus chromogenes, are almost exclusively found in animals. A wide range of molecular typing methods have been used in the past two decades to investigate the epidemiology of bovine mastitis at the subspecies level. These include comparative typing methods that are based on electrophoretic banding patterns, library typing methods that are based on the sequence of selected genes, virulence gene arrays and whole genome sequencing projects. The strain distribution of mastitis pathogens has been investigated within individual animals and across animals, herds, countries and host species, with consideration of the mammary gland, other animal or human body sites, and environmental sources. Molecular epidemiological studies have contributed considerably to our understanding of sources, transmission routes, and prognosis for many bovine mastitis pathogens and to our understanding of mechanisms of host-adaptation and disease causation. In this review, we summarize knowledge gleaned from two decades of molecular epidemiological studies of mastitis pathogens in dairy cattle and discuss aspects of comparative relevance to human medicine.
Zurfluh, Katrin; Wang, Juan; Klumpp, Jochen; Nüesch-Inderbinen, Magdalena; Fanning, Séamus; Stephan, Roger
2014-01-01
Objectives: The purpose of this study was to characterize sets of extended-spectrum β-lactamases (ESBL)-producing Enterobacteriaceae collected longitudinally from different flocks of broiler breeders, meconium of 1-day-old broilers from theses breeder flocks, as well as from these broiler flocks before slaughter. Methods: Five sets of ESBL-producing Escherichia coli were studied by multi-locus sequence typing (MLST), phylogenetic grouping, PCR-based replicon typing and resistance profiling. The blaCTX-M-1-harboring plasmids of one set (pHV295.1, pHV114.1, and pHV292.1) were fully sequenced and subjected to comparative analysis. Results: Eleven different MLST sequence types (ST) were identified with ST1056 the predominant one, isolated in all five sets either on the broiler breeder or meconium level. Plasmid sequencing revealed that blaCTX-M-1 was carried by highly similar IncI1/ST3 plasmids that were 105 076 bp, 110 997 bp, and 117 269 bp in size, respectively. Conclusions: The fact that genetically similar IncI1/ST3 plasmids were found in ESBL-producing E. coli of different MLST types isolated at the different levels in the broiler production pyramid provides strong evidence for a vertical transmission of these plasmids from a common source (nucleus poultry flocks). PMID:25324838
Storage and utilization of HLA genomic data--new approaches to HLA typing.
Helmberg, W
2000-01-01
Currently available DNA-based HLA typing assays can provide detailed information about sequence motifs of a tested sample. It is still a common practice, however, for information acquired by high-resolution sequence specific oligonucleotide probe (SSOP) typing or sequence specific priming (SSP) to be presented in a low-resolution serological format. Unfortunately, this representation can lead to significant loss of useful data in many cases. An alternative to assigning allele equivalents to suchDNA typing results is simply to store the observed typing pattern and utilize the information with the help of Virtual DNA Analysis (VDA). Interpretation of the stored typing patterns can then be updated based on newly defined alleles, assuming the sequence motifs detected by the typing reagents are known. Rather than updating reagent specificities in individual laboratories, such updates should be performed in a central, publicly available sequence database. By referring to this database, HLA genomic data can then be stored and transferred between laboratories without loss of information. The 13th International Histocompatibility Workshop offers an ideal opportunity to begin building this common database for the entire human MHC.
Typing of artiodactyl MHC-DRB genes with the help of intronic simple repeated DNA sequences.
Schwaiger, F W; Buitkamp, J; Weyers, E; Epplen, J T
1993-02-01
An efficient oligonucleotide typing method for the highly polymorphic MHC-DRB genes is described for artiodactyls like cattle, sheep and goat. By means of the polymerase chain reaction, the second exon of MHC-DRB is amplified as well as part of the adjacent intron containing a mixed simple repeat sequence. Using this primer combination we were able to amplify the MHC-DRB exons 2 and adjacent introns from all of the investigated 10 species of the family of Bovidae and giraffes. Therefore, the DRB genes of novel artiodactyl species can also be readily studied. Oligonucleotide probes specific for the polymorphisms of ungulate DRB genes are used with which sequences differing in at least one single base can be distinguished. Exonic polymorphism was found to be correlated with the allele lengths and the patterns of the repeat structures. Hence oligonucleotide probes specific for different simple repeats and polymorphic positions serve also for typing across species barriers. The strict correlation of sequence length and exonic polymorphism permits a preselection of specific oligonucleotides for hybridization. Thus more than 20 alleles can already be differentiated from each of the three species.
Yoon, Song-Ro; Arnheim, Norman; Calabrese, Peter
2016-01-01
We used targeted next generation deep-sequencing (Safe Sequencing System) to measure ultra-rare de novo mutation frequencies in the human male germline by attaching a unique identifier code to each target DNA molecule. Segments from three different human genes (FGFR3, MECP2 and PTPN11) were studied. Regardless of the gene segment, the particular testis donor or the 73 different testis pieces used, the frequencies for any one of the six different mutation types were consistent. Averaging over the C>T/G>A and G>T/C>A mutation types the background mutation frequency was 2.6x10-5 per base pair, while for the four other mutation types the average background frequency was lower at 1.5x10-6 per base pair. These rates far exceed the well documented human genome average frequency per base pair (~10−8) suggesting a non-biological explanation for our data. By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles. Finally, we looked at a previously studied disease mutation in the PTPN11 gene and could easily distinguish true mutations from the SSS background. We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments. PMID:27341568
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.
Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo
2016-07-19
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Representing and computing regular languages on massively parallel networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, M.I.; O'Sullivan, J.A.; Boysam, B.
1991-01-01
This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less
USDA-ARS?s Scientific Manuscript database
Background: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results: We describe the sequencing and assembly of...
Gitman, Melissa R.; McTaggart, Lisa; Spinato, Joanna; Poopalarajah, Rahgavi; Lister, Erin; Husain, Shahid
2017-01-01
ABSTRACT Aspergillus spp. cause serious invasive lung infections, and Aspergillus fumigatus is the most commonly encountered clinically significant species. Voriconazole is considered to be the drug of choice for treating A. fumigatus infections; however, rising resistance rates have been reported. We evaluated a matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS)-based method for the differentiation between wild-type and non-wild-type isolates of 20 Aspergillus spp. (including 2 isolates of Aspergillus ustus and 1 of Aspergillus calidoustus that were used as controls due their intrinsic low azole susceptibility with respect to the in vitro response to voriconazole). At 30 and 48 h of incubation, there was complete agreement between Cyp51A sequence analysis, broth microdilution, and MALDI-TOF MS classification of isolates as wild type or non-wild type. In this proof-of-concept study, we demonstrated that MALDI-TOF MS can be used to accurately detect A. fumigatus strains with reduced voriconazole susceptibility. However, rather than proving to be a rapid and simple method for antifungal susceptibility testing, this particular MS-based method showed no benefit over conventional testing methods. PMID:28404678
Gitman, Melissa R; McTaggart, Lisa; Spinato, Joanna; Poopalarajah, Rahgavi; Lister, Erin; Husain, Shahid; Kus, Julianne V
2017-07-01
Aspergillus spp. cause serious invasive lung infections, and Aspergillus fumigatus is the most commonly encountered clinically significant species. Voriconazole is considered to be the drug of choice for treating A. fumigatus infections; however, rising resistance rates have been reported. We evaluated a matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS)-based method for the differentiation between wild-type and non-wild-type isolates of 20 Aspergillus spp. (including 2 isolates of Aspergillus ustus and 1 of Aspergillus calidoustus that were used as controls due their intrinsic low azole susceptibility with respect to the in vitro response to voriconazole). At 30 and 48 h of incubation, there was complete agreement between Cyp51A sequence analysis, broth microdilution, and MALDI-TOF MS classification of isolates as wild type or non-wild type. In this proof-of-concept study, we demonstrated that MALDI-TOF MS can be used to accurately detect A. fumigatus strains with reduced voriconazole susceptibility. However, rather than proving to be a rapid and simple method for antifungal susceptibility testing, this particular MS-based method showed no benefit over conventional testing methods. © Crown copyright 2017.
Cheng, Keding; Chui, Huixia; Domish, Larissa; Sloan, Angela; Hernandez, Drexler; McCorrister, Stuart; Robinson, Alyssia; Walker, Matthew; Peterson, Lorea A M; Majcher, Miles; Ratnam, Sam; Haldane, David J M; Bekal, Sadjia; Wylie, John; Chui, Linda; Tyler, Shaun; Xu, Bianli; Reimer, Aleisha; Nadon, Celine; Knox, J David; Wang, Gehua
2016-08-01
Mass spectrometry-based phenotypic H-antigen typing (MS-H) combined with whole-genome-sequencing-based genetic identification of H antigens, O antigens, and toxins (WGS-HOT) was used to type 60 clinical Escherichia coli isolates, 43 of which were previously identified as nonmotile, H type undetermined, or O rough by serotyping or having shown discordant MS-H and serotyping results. Whole-genome sequencing confirmed that MS-H was able to provide more accurate data regarding H antigen expression than serotyping. Further, enhanced and more confident O antigen identification resulted from gene cluster based typing in combination with conventional typing based on the gene pair comprising wzx and wzy and that comprising wzm and wzt The O antigen was identified in 94.6% of the isolates when the two genetic O typing approaches (gene pair and gene cluster) were used in conjunction, in comparison to 78.6% when the gene pair database was used alone. In addition, 98.2% of the isolates showed the existence of genes for various toxins and/or virulence factors, among which verotoxins (Shiga toxin 1 and/or Shiga toxin 2) were 100% concordant with conventional PCR based testing results. With more applications of mass spectrometry and whole-genome sequencing in clinical microbiology laboratories, this combined phenotypic and genetic typing platform (MS-H plus WGS-HOT) should be ideal for pathogenic E. coli typing. Copyright © 2016 Cheng et al.
Paparini, Andrea; Gofton, Alexander; Yang, Rongchang; White, Nicole; Bunce, Michael; Ryan, Una M
2015-01-01
Cryptosporidium is an important enteric pathogen that infects a wide range of humans and animals. Rapid and reliable detection and characterisation methods are essential for understanding the transmission dynamics of the parasite. Sanger sequencing, and high-throughput sequencing (HTS) on an Ion Torrent platform, were compared with each other for their sensitivity and accuracy in detecting and characterising 25 Cryptosporidium-positive human and animal faecal samples. Ion Torrent reads (n = 123,857) were obtained at both 18S rRNA and actin loci for 21 of the 25 samples. Of these, one isolate at the actin locus (Cattle 05) and three at the 18S rRNA locus (HTS 10, HTS 11 and HTS 12), suffered PCR drop-out (i.e. PCR failures) when using fusion-tagged PCR. Sanger sequences were obtained for both loci for 23 of the 25 samples and showed good agreement with Ion Torrent-based genotyping. Two samples both from pythons (SK 02 and SK 05) produced mixed 18S and actin chromatograms by Sanger sequencing but were clearly identified by Ion Torrent sequencing as C. muris. One isolate (SK 03) was typed as C. muris by Sanger sequencing but was identified as a mixed C. muris and C. tyzzeri infection by HTS. 18S rRNA Type B sequences were identified in 4/6 C. parvum isolates when deep sequenced but were undetected in Sanger sequencing. Sanger was cheaper than Ion Torrent when sequencing a small numbers of samples, but when larger numbers of samples are considered (n = 60), the costs were comparative. Fusion-tagged amplicon based approaches are a powerful way of approaching mixtures, the only draw-back being the loss of PCR efficiency on low-template samples when using primers coupled to MID tags and adaptors. Taken together these data show that HTS has excellent potential for revealing the "true" composition of species/types in a Cryptosporidium infection, but that HTS workflows need to be carefully developed to ensure sensitivity, accuracy and contamination are controlled. Copyright © 2015 Elsevier Inc. All rights reserved.
CRISPR Diversity and Microevolution in Clostridium difficile
Andersen, Joakim M.; Shoup, Madelyn; Robinson, Cathy; Britton, Robert; Olsen, Katharina E.P.; Barrangou, Rodolphe
2016-01-01
Abstract Virulent strains of Clostridium difficile have become a global health problem associated with morbidity and mortality. Traditional typing methods do not provide ideal resolution to track outbreak strains, ascertain genetic diversity between isolates, or monitor the phylogeny of this species on a global basis. Here, we investigate the occurrence and diversity of clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (cas) in C. difficile to assess the potential of CRISPR-based phylogeny and high-resolution genotyping. A single Type-IB CRISPR-Cas system was identified in 217 analyzed genomes with cas gene clusters present at conserved chromosomal locations, suggesting vertical evolution of the system, assessing a total of 1,865 CRISPR arrays. The CRISPR arrays, markedly enriched (8.5 arrays/genome) compared with other species, occur both at conserved and variable locations across strains, and thus provide a basis for typing based on locus occurrence and spacer polymorphism. Clustering of strains by array composition correlated with sequence type (ST) analysis. Spacer content and polymorphism within conserved CRISPR arrays revealed phylogenetic relationship across clades and within ST. Spacer polymorphisms of conserved arrays were instrumental for differentiating closely related strains, e.g., ST1/RT027/B1 strains and pathogenicity locus encoding ST3/RT001 strains. CRISPR spacers showed sequence similarity to phage sequences, which is consistent with the native role of CRISPR-Cas as adaptive immune systems in bacteria. Overall, CRISPR-Cas sequences constitute a valuable basis for genotyping of C. difficile isolates, provide insights into the micro-evolutionary events that occur between closely related strains, and reflect the evolutionary trajectory of these genomes. PMID:27576538
General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies
Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong
2013-01-01
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515
Boosting antibody developability through rational sequence optimization.
Seeliger, Daniel; Schulz, Patrick; Litzenburger, Tobias; Spitz, Julia; Hoerer, Stefan; Blech, Michaela; Enenkel, Barbara; Studts, Joey M; Garidel, Patrick; Karow, Anne R
2015-01-01
The application of monoclonal antibodies as commercial therapeutics poses substantial demands on stability and properties of an antibody. Therapeutic molecules that exhibit favorable properties increase the success rate in development. However, it is not yet fully understood how the protein sequences of an antibody translates into favorable in vitro molecule properties. In this work, computational design strategies based on heuristic sequence analysis were used to systematically modify an antibody that exhibited a tendency to precipitation in vitro. The resulting series of closely related antibodies showed improved stability as assessed by biophysical methods and long-term stability experiments. As a notable observation, expression levels also improved in comparison with the wild-type candidate. The methods employed to optimize the protein sequences, as well as the biophysical data used to determine the effect on stability under conditions commonly used in the formulation of therapeutic proteins, are described. Together, the experimental and computational data led to consistent conclusions regarding the effect of the introduced mutations. Our approach exemplifies how computational methods can be used to guide antibody optimization for increased stability.
Effective evaluation of privacy protection techniques in visible and thermal imagery
NASA Astrophysics Data System (ADS)
Nawaz, Tahir; Berg, Amanda; Ferryman, James; Ahlberg, Jörgen; Felsberg, Michael
2017-09-01
Privacy protection may be defined as replacing the original content in an image region with a (less intrusive) content having modified target appearance information to make it less recognizable by applying a privacy protection technique. Indeed, the development of privacy protection techniques also needs to be complemented with an established objective evaluation method to facilitate their assessment and comparison. Generally, existing evaluation methods rely on the use of subjective judgments or assume a specific target type in image data and use target detection and recognition accuracies to assess privacy protection. An annotation-free evaluation method that is neither subjective nor assumes a specific target type is proposed. It assesses two key aspects of privacy protection: "protection" and "utility." Protection is quantified as an appearance similarity, and utility is measured as a structural similarity between original and privacy-protected image regions. We performed an extensive experimentation using six challenging datasets (having 12 video sequences), including a new dataset (having six sequences) that contains visible and thermal imagery. The new dataset is made available online for the community. We demonstrate effectiveness of the proposed method by evaluating six image-based privacy protection techniques and also show comparisons of the proposed method over existing methods.
Sekizuka, Tsuyoshi; Yamashita, Akifumi; Murase, Yoshiro; Iwamoto, Tomotada; Mitarai, Satoshi; Kato, Seiya; Kuroda, Makoto
2015-01-01
Whole-genome sequencing (WGS) with next-generation DNA sequencing (NGS) is an increasingly accessible and affordable method for genotyping hundreds of Mycobacterium tuberculosis (Mtb) isolates, leading to more effective epidemiological studies involving single nucleotide variations (SNVs) in core genomic sequences based on molecular evolution. We developed an all-in-one web-based tool for genotyping Mtb, referred to as the Total Genotyping Solution for TB (TGS-TB), to facilitate multiple genotyping platforms using NGS for spoligotyping and the detection of phylogenies with core genomic SNVs, IS6110 insertion sites, and 43 customized loci for variable number tandem repeat (VNTR) through a user-friendly, simple click interface. This methodology is implemented with a KvarQ script to predict MTBC lineages/sublineages and potential antimicrobial resistance. Seven Mtb isolates (JP01 to JP07) in this study showing the same VNTR profile were accurately discriminated through median-joining network analysis using SNVs unique to those isolates. An additional IS6110 insertion was detected in one of those isolates as supportive genetic information in addition to core genomic SNVs. The results of in silico analyses using TGS-TB are consistent with those obtained using conventional molecular genotyping methods, suggesting that NGS short reads could provide multiple genotypes to discriminate multiple strains of Mtb, although longer NGS reads (≥300-mer) will be required for full genotyping on the TGS-TB web site. Most available short reads (~100-mer) can be utilized to discriminate the isolates based on the core genome phylogeny. TGS-TB provides a more accurate and discriminative strain typing for clinical and epidemiological investigations; NGS strain typing offers a total genotyping solution for Mtb outbreak and surveillance. TGS-TB web site: https://gph.niid.go.jp/tgs-tb/. PMID:26565975
Comparison of Dixon Sequences for Estimation of Percent Breast Fibroglandular Tissue
Ledger, Araminta E. W.; Scurr, Erica D.; Hughes, Julie; Macdonald, Alison; Wallace, Toni; Thomas, Karen; Wilson, Robin; Leach, Martin O.; Schmidt, Maria A.
2016-01-01
Objectives To evaluate sources of error in the Magnetic Resonance Imaging (MRI) measurement of percent fibroglandular tissue (%FGT) using two-point Dixon sequences for fat-water separation. Methods Ten female volunteers (median age: 31 yrs, range: 23–50 yrs) gave informed consent following Research Ethics Committee approval. Each volunteer was scanned twice following repositioning to enable an estimation of measurement repeatability from high-resolution gradient-echo (GRE) proton-density (PD)-weighted Dixon sequences. Differences in measures of %FGT attributable to resolution, T1 weighting and sequence type were assessed by comparison of this Dixon sequence with low-resolution GRE PD-weighted Dixon data, and against gradient-echo (GRE) or spin-echo (SE) based T1-weighted Dixon datasets, respectively. Results %FGT measurement from high-resolution PD-weighted Dixon sequences had a coefficient of repeatability of ±4.3%. There was no significant difference in %FGT between high-resolution and low-resolution PD-weighted data. Values of %FGT from GRE and SE T1-weighted data were strongly correlated with that derived from PD-weighted data (r = 0.995 and 0.96, respectively). However, both sequences exhibited higher mean %FGT by 2.9% (p < 0.0001) and 12.6% (p < 0.0001), respectively, in comparison with PD-weighted data; the increase in %FGT from the SE T1-weighted sequence was significantly larger at lower breast densities. Conclusion Although measurement of %FGT at low resolution is feasible, T1 weighting and sequence type impact on the accuracy of Dixon-based %FGT measurements; Dixon MRI protocols for %FGT measurement should be carefully considered, particularly for longitudinal or multi-centre studies. PMID:27011312
Botelho, Ana; Canto, Ana; Leão, Célia; Cunha, Mónica V
2015-01-01
Typical CRISPR (clustered, regularly interspaced, short palindromic repeat) regions are constituted by short direct repeats (DRs), interspersed with similarly sized non-repetitive spacers, derived from transmissible genetic elements, acquired when the cell is challenged with foreign DNA. The analysis of the structure, in number and nature, of CRISPR spacers is a valuable tool for molecular typing since these loci are polymorphic among strains, originating characteristic signatures. The existence of CRISPR structures in the genome of the members of Mycobacterium tuberculosis complex (MTBC) enabled the development of a genotyping method, based on the analysis of the presence or absence of 43 oligonucleotide spacers separated by conserved DRs. This method, called spoligotyping, consists on PCR amplification of the DR chromosomal region and recognition after hybridization of the spacers that are present. The workflow beneath this methodology implies that the PCR products are brought onto a membrane containing synthetic oligonucleotides that have complementary sequences to the spacer sequences. Lack of hybridization of the PCR products to a specific oligonucleotide sequence indicates absence of the correspondent spacer sequence in the examined strain. Spoligotyping gained great notoriety as a robust identification and typing tool for members of MTBC, enabling multiple epidemiological studies on human and animal tuberculosis.
Radioresistance of GGG Sequences to Prompt Strand Break Formation from Direct-Type Radiation Damage
Black, Paul J.; Miller, Adam S.; Hayes, Jeffrey J.
2016-01-01
Purpose As humans, we are constantly exposed to ionizing radiation from natural, man-made and cosmic sources which can damage DNA, leading to deleterious effects including cancer incidence. In this work we introduce a method to monitor strand breaks resulting from damage due to the direct effect of ionizing radiation and provide evidence for sequence-dependent effects leading to strand breaks. Materials and methods To analyze only DNA strand breaks caused by radiation damage due to the direct effect of ionizing radiation, we combined an established technique to generate dehydrated DNA samples with a technique to analyze single strand breaks on short oligonucleotide sequences via denaturing gel electrophoresis. Results We find that direct damage primarily results in a reduced number of strand breaks in guanine triplet regions (GGG) when compared to isolated guanine (G) bases with identical flanking base context. In addition, we observe strand break behavior possibly indicative of protection of guanine bases when flanked by pyrimidines, and sensitization of guanine to strand break when flanked by adenine (A) bases in both isolated G and GGG cases. Conclusions These observations provide insight into the strand break behavior in GGG regions damaged via the direct effect of ionizing radiation. In addition, this could be indicative of DNA sequences that are naturally more susceptible to strand break due to the direct effect of ionizing radiation. PMID:27349757
Zhang, Guang Lan; Keskin, Derin B.; Lin, Hsin-Nan; Lin, Hong Huang; DeLuca, David S.; Leppanen, Scott; Milford, Edgar L.; Reinherz, Ellis L.; Brusic, Vladimir
2014-01-01
Human leukocyte antigens (HLA) are important biomarkers because multiple diseases, drug toxicity, and vaccine responses reveal strong HLA associations. Current clinical HLA typing is an elimination process requiring serial testing. We present an alternative in situ synthesized DNA-based microarray method that contains hundreds of thousands of probes representing a complete overlapping set covering 1,610 clinically relevant HLA class I alleles accompanied by computational tools for assigning HLA type to 4-digit resolution. Our proof-of-concept experiment included 21 blood samples, 18 cell lines, and multiple controls. The method is accurate, robust, and amenable to automation. Typing errors were restricted to homozygous samples or those with very closely related alleles from the same locus, but readily resolved by targeted DNA sequencing validation of flagged samples. High-throughput HLA typing technologies that are effective, yet inexpensive, can be used to analyze the world’s populations, benefiting both global public health and personalized health care. PMID:25505899
Transfusion strategy for weak D type 4.0 based on RHD alleles and RH haplotypes in Tunisia
Ouchari, Mouna; Srivastava, Kshitij; Romdhane, Houda; Yacoub, Saloua Jemni; Flegel, Willy Albert
2017-01-01
Background With more than 460 RHD alleles, this gene is the most complex and polymorphic among all blood group systems. The Tunisian population has the largest known prevalence of weak D type 4.0 alleles, occurring in 1 of 105 RH haplotypes. We aimed to establish a rationale for the transfusion strategy of weak D type 4.0 in Tunisia. Study design and methods Donors were randomly screened for the serological weak D phenotype. The RHD coding sequence and parts of the introns were sequenced. To establish the RH haplotype, the RHCE gene was tested for characteristic single nucleotide positions. Results We determined all RHD alleles and the RH haplotypes coding for the serologic weak D phenotype among 13,431 Tunisian donations. A serologic weak D phenotype was found in 67 individuals (0.50%). Among them, 60 carried a weak D type 4 allele: 53 weak D type 4.0, 6 weak D type 4.2.2 (DAR), and 1 weak D type 4.1. Another 4 donors had 1 variant allele each: DVII, weak D type 1, weak D type 3, and weak D type 100, while 3 donors showed a normal RHD sequence. The weak D type 4.0 was most often linked to RHCE*ceVS.04.01, weak D type 4.2.2 to RHCE*ceAR, and weak D type 4.1 to RHCE*ceVS.02, while the other RHD alleles were linked to one of the common RHCE alleles. Conclusions Among the weak D phenotypes in Tunisia, no novel RHD allele was found and almost 90% were caused by alleles of the weak D type 4 cluster, of which 88% represented the weak D type 4.0 allele. Based on established RH haplotypes for variant RHD and RHCE alleles and the lack of adverse clinical reports, we recommend D positive transfusions for patients with weak D type 4.0 in Tunisia. PMID:29193104
Thiel, William H.; Bair, Thomas; Peek, Andrew S.; Liu, Xiuying; Dassie, Justin; Stockdale, Katie R.; Behlke, Mark A.; Miller, Francis J.; Giangrande, Paloma H.
2012-01-01
Background The broad applicability of RNA aptamers as cell-specific delivery tools for therapeutic reagents depends on the ability to identify aptamer sequences that selectively access the cytoplasm of distinct cell types. Towards this end, we have developed a novel approach that combines a cell-based selection method (cell-internalization SELEX) with high-throughput sequencing (HTS) and bioinformatics analyses to rapidly identify cell-specific, internalization-competent RNA aptamers. Methodology/Principal Findings We demonstrate the utility of this approach by enriching for RNA aptamers capable of selective internalization into vascular smooth muscle cells (VSMCs). Several rounds of positive (VSMCs) and negative (endothelial cells; ECs) selection were performed to enrich for aptamer sequences that preferentially internalize into VSMCs. To identify candidate RNA aptamer sequences, HTS data from each round of selection were analyzed using bioinformatics methods: (1) metrics of selection enrichment; and (2) pairwise comparisons of sequence and structural similarity, termed edit and tree distance, respectively. Correlation analyses of experimentally validated aptamers or rounds revealed that the best cell-specific, internalizing aptamers are enriched as a result of the negative selection step performed against ECs. Conclusions and Significance We describe a novel approach that combines cell-internalization SELEX with HTS and bioinformatics analysis to identify cell-specific, cell-internalizing RNA aptamers. Our data highlight the importance of performing a pre-clear step against a non-target cell in order to select for cell-specific aptamers. We expect the extended use of this approach to enable the identification of aptamers to a multitude of different cell types, thereby facilitating the broad development of targeted cell therapies. PMID:22962591
Caporale, Lynn Helena
2012-09-01
This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.
Singh, Ajay Vir; Chauhan, Devendra Singh; Singh, Abhinendra; Singh, Pravin Kumar; Sohal, Jagdip Singh; Singh, Shoor Vir
2015-01-01
Of the three major genotypes of Mycobacterium avium subspecies paratuberculosis (MAP), 'Bison type' is most prevalent genotype in the domestic livestock species of the country, and has also been recovered from patients suffering from Crohn's disease. Recently, a new assay based on IS1311 locus 2 PCR- restriction endonuclease analysis (REA) was designed to distinguish between 'Indian Bison type' and non-Indian genotypes. The present study investigated discriminatory potential of this new assay while screening of a panel of MAP isolates of diverse genotypes and from different geographical regions. A total of 53 mycobacterial isolates (41 MAP and 12 mycobacterium other than MAP), three MAP genomic DNA and 36 MAP positive faecal DNA samples from different livestock species (cattle, buffaloes, goat, sheep and bison) and geographical regions (India, Canada, USA, Spain and Portugal) were included in the study. The extracted DNA samples (n=92) were analyzed for the presence of MAP specific sequences (IS900, ISMav 2 and HspX) using PCR. DNA samples were further subjected to genotype differentiation using IS1311 PCR-REA and IS1311 L2 PCR-REA methods. All the DNA samples (except DNA from non-MAP mycobacterial isolates) were positive for all the three MAP specific sequences based PCRs. IS1311 PCR-REA showed that MAP DNA samples of Indian origin belonged to 'Bison type'. Whereas, of the total 19 non-Indian MAP DNA samples, 2, 15 and 2 were genotyped as 'Bison type', 'Cattle type' and 'Sheep type', respectively. IS1311 L2 PCR-REA method showed different restriction profiles of 'Bison type' genotype as compared to non-Indian DNA samples. IS1311 L2 PCR-REA method successfully discriminated 'Indian Bison type' from other non-Indian genotypes and showed potential to be future epidemiological tool and for genotyping of MAP isolates.
Reyes-Montes, M del R; Taylor, M L; Curiel-Quesada, E; Mesa-Arango, A C
2000-12-01
The classification of microbial strains is currently based on different typing methods, which must meet certain criteria in order to be widely used. Phenotypic and genotypic methods are being employed in the epidemiology of several fungal diseases. However, some problems associated to the phenotypic methods have fostered genotyping procedures, from DNA polymorphic diversity to gene sequencing studies, all aiming to differentiate and to relate fungal isolates or strains. Through these studies, it is possible to identify outbreaks, to detect nosocomial infection transmission, and to determine the source of infection, as well as to recognize virulent isolates. This paper is aimed at analyzing the methods recently used to type Histoplasma capsulatum, causative agent of the systemic mycosis known as histoplasmosis, in order to recommend those that yield reproducible and accurate results.
A motif detection and classification method for peptide sequences using genetic programming.
Tomita, Yasuyuki; Kato, Ryuji; Okochi, Mina; Honda, Hiroyuki
2008-08-01
An exploration of common rules (property motifs) in amino acid sequences has been required for the design of novel sequences and elucidation of the interactions between molecules controlled by the structural or physical environment. In the present study, we developed a new method to search property motifs that are common in peptide sequence data. Our method comprises the following two characteristics: (i) the automatic determination of the position and length of common property motifs by calculating the physicochemical similarity of amino acids, and (ii) the quick and effective exploration of motif candidates that discriminates the positives and negatives by the introduction of genetic programming (GP). Our method was evaluated by two types of model data sets. First, the intentionally buried property motifs were searched in the artificially derived peptide data containing intentionally buried property motifs. As a result, the expected property motifs were correctly extracted by our algorithm. Second, the peptide data that interact with MHC class II molecules were analyzed as one of the models of biologically active peptides with buried motifs in various lengths. Twofold MHC class II binding peptides were identified with the rule using our method, compared to the existing scoring matrix method. In conclusion, our GP based motif searching approach enabled to obtain knowledge of functional aspects of the peptides without any prior knowledge.
Simultaneous phylogeny reconstruction and multiple sequence alignment
Yue, Feng; Shi, Jian; Tang, Jijun
2009-01-01
Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
Yoshikawa, Yuko; Ochiai, Yoshitsugu; Mochizuki, Mariko; Takano, Takashi; Hondo, Ryo; Ueda, Fukiko
2018-05-31
To assess the level of Listeria monocytogenes contamination of domestic retail meat in Tokyo, Japan, we compared isolates from 2004 to 2007 with those isolated before 2003. The overall prevalence of L. monocytogenes among these samples significantly diminished over time (1998-2003, 28.0%; 2004-2007, 17.6%) reflecting a significant decrease in the frequency of contamination of beef. Serotype 1/2a was isolated most frequently, reflecting a change in the predominant serotype in pork from 1/2c to 1/2a. We performed a simple genetic subtyping method based on three genes, iap, sigB, and actA, as well as traditional multilocus sequence typing to classify the allele types (ATs). No extensive variation among sequence types was detected; however, increased genetic diversity among the ATs of the three genes in the 2004-2007 isolates was evident. We identified AT 26 of the iap gene, not previously reported in Japanese isolates, and six ATs of the sigB gene, including four with nonsense mutations not currently registered in L. monocytogenes DNA databases. sigB is an evolutionally conserved gene that plays a role in the stress response. Our results indicate that the sigB gene may be relatively unstable among L. monocytogenes strains circulating in Japan.
Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A
2017-01-01
RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.
Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.
Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza
2013-10-01
Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.
Shum, Bennett O V; Henner, Ilya; Belluoccio, Daniele; Hinchcliffe, Marcus J
2017-07-01
The sensitivity and specificity of next-generation sequencing laboratory developed tests (LDTs) are typically determined by an analyte-specific approach. Analyte-specific validations use disease-specific controls to assess an LDT's ability to detect known pathogenic variants. Alternatively, a methods-based approach can be used for LDT technical validations. Methods-focused validations do not use disease-specific controls but use benchmark reference DNA that contains known variants (benign, variants of unknown significance, and pathogenic) to assess variant calling accuracy of a next-generation sequencing workflow. Recently, four whole-genome reference materials (RMs) from the National Institute of Standards and Technology (NIST) were released to standardize methods-based validations of next-generation sequencing panels across laboratories. We provide a practical method for using NIST RMs to validate multigene panels. We analyzed the utility of RMs in validating a novel newborn screening test that targets 70 genes, called NEO1. Despite the NIST RM variant truth set originating from multiple sequencing platforms, replicates, and library types, we discovered a 5.2% false-negative variant detection rate in the RM truth set genes that were assessed in our validation. We developed a strategy using complementary non-RM controls to demonstrate 99.6% sensitivity of the NEO1 test in detecting variants. Our findings have implications for laboratories or proficiency testing organizations using whole-genome NIST RMs for testing. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Molecular Identification of Ectomycorrhizal Mycelium in Soil Horizons
Landeweert, Renske; Leeflang, Paula; Kuyper, Thom W.; Hoffland, Ellis; Rosling, Anna; Wernars, Karel; Smit, Eric
2003-01-01
Molecular identification techniques based on total DNA extraction provide a unique tool for identification of mycelium in soil. Using molecular identification techniques, the ectomycorrhizal (EM) fungal community under coniferous vegetation was analyzed. Soil samples were taken at different depths from four horizons of a podzol profile. A basidiomycete-specific primer pair (ITS1F-ITS4B) was used to amplify fungal internal transcribed spacer (ITS) sequences from total DNA extracts of the soil horizons. Amplified basidiomycete DNA was cloned and sequenced, and a selection of the obtained clones was analyzed phylogenetically. Based on sequence similarity, the fungal clone sequences were sorted into 25 different fungal groups, or operational taxonomic units (OTUs). Out of 25 basidiomycete OTUs, 7 OTUs showed high nucleotide homology (≥99%) with known EM fungal sequences and 16 were found exclusively in the mineral soil. The taxonomic positions of six OTUs remained unclear. OTU sequences were compared to sequences from morphotyped EM root tips collected from the same sites. Of the 25 OTUs, 10 OTUs had ≥98% sequence similarity with these EM root tip sequences. The present study demonstrates the use of molecular techniques to identify EM hyphae in various soil types. This approach differs from the conventional method of EM root tip identification and provides a novel approach to examine EM fungal communities in soil. PMID:12514012
Ali, M A; Al-Hemaid, F M; Lee, J; Hatamleh, A A; Gyulai, G; Rahman, M O
2015-10-02
The present study explored the systematic inventory of Echinops L. (Asteraceae) of Saudi Arabia, with special reference to the molecular typing of Echinops abuzinadianus Chaudhary, an endemic species to Saudi Arabia, based on the internal transcribed spacer (ITS) sequences (ITS1-5.8S-ITS2) of nuclear ribosomal DNA. A sequence similarity search using BLAST and a phylogenetic analysis of the ITS sequence of E. abuzinadianus revealed a high level of sequence similarity with E. glaberrimus DC. (section Ritropsis). The novel primary sequence and the secondary structure of ITS2 of E. abuzinadianus could potentially be used for molecular genotyping.
Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier
2015-02-22
Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.
Statistical method to compare massive parallel sequencing pipelines.
Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P
2017-03-01
Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.
Methods for MHC genotyping in non-model vertebrates.
Babik, W
2010-03-01
Genes of the major histocompatibility complex (MHC) are considered a paradigm of adaptive evolution at the molecular level and as such are frequently investigated by evolutionary biologists and ecologists. Accurate genotyping is essential for understanding of the role that MHC variation plays in natural populations, but may be extremely challenging. Here, I discuss the DNA-based methods currently used for genotyping MHC in non-model vertebrates, as well as techniques likely to find widespread use in the future. I also highlight the aspects of MHC structure that are relevant for genotyping, and detail the challenges posed by the complex genomic organization and high sequence variation of MHC loci. Special emphasis is placed on designing appropriate PCR primers, accounting for artefacts and the problem of genotyping alleles from multiple, co-amplifying loci, a strategy which is frequently necessary due to the structure of the MHC. The suitability of typing techniques is compared in various research situations, strategies for efficient genotyping are discussed and areas of likely progress in future are identified. This review addresses the well established typing methods such as the Single Strand Conformation Polymorphism (SSCP), Denaturing Gradient Gel Electrophoresis (DGGE), Reference Strand Conformational Analysis (RSCA) and cloning of PCR products. In addition, it includes the intriguing possibility of direct amplicon sequencing followed by the computational inference of alleles and also next generation sequencing (NGS) technologies; the latter technique may, in the future, find widespread use in typing complex multilocus MHC systems. © 2009 Blackwell Publishing Ltd.
Stojowska, Karolina; Krawczyk, Beata
2014-01-01
We have designed a new ddLMS PCR (double digestion Ligation Mediated Suppression PCR) method based on restriction site polymorphism upstream from the specific target sequence for the simultaneous identification and differentiation of bacterial strains. The ddLMS PCR combines a simple PCR used for species or genus identification and the LM PCR strategy for strain differentiation. The bacterial identification is confirmed in the form of the PCR product(s), while the length of the PCR product makes it possible to differentiate between bacterial strains. If there is a single copy of the target sequence within genomic DNA, one specific PCR product is created (simplex ddLMS PCR), whereas for multiple copies of the gene the fingerprinting patterns can be obtained (multiplex ddLMS PCR). The described ddLMS PCR method is designed for rapid and specific strain differentiation in medical and microbiological studies. In comparison to other LM PCR it has substantial advantages: enables specific species' DNA-typing without the need for pure bacterial culture selection, is not sensitive to contamination with other cells or genomic DNA, and gives univocal “band-based” results, which are easy to interpret. The utility of ddLMS PCR was shown for Acinetobacter calcoaceticus-baumannii (Acb) complex, the genetically closely related and phenotypically similar species and also important nosocomial pathogens, for which currently, there are no recommended methods for screening, typing and identification. In this article two models are proposed: 3′ recA-ddLMS PCR-MaeII/RsaI for Acb complex interspecific typing and 5′ rrn-ddLMS PCR-HindIII/ApaI for Acinetobacter baumannii intraspecific typing. ddLMS PCR allows not only for DNA-typing but also for confirmation of species in one reaction. Also, practical guidelines for designing a diagnostic test based on ddLMS PCR for genotyping different species of bacteria are provided. PMID:25522278
Analysis of delay reducing and fuel saving sequencing and spacing algorithms for arrival traffic
NASA Technical Reports Server (NTRS)
Neuman, Frank; Erzberger, Heinz
1991-01-01
The air traffic control subsystem that performs sequencing and spacing is discussed. The function of the sequencing and spacing algorithms is to automatically plan the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several algorithms are described and their statistical performance is examined. Sequencing brings order to an arrival sequence for aircraft. First-come-first-served sequencing (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the arriving traffic, gaps will remain in the sequence of aircraft. Delays are reduced by time-advancing the leading aircraft of each group while still preserving the FCFS order. Tightly spaced groups of aircraft remain with a mix of heavy and large aircraft. Spacing requirements differ for different types of aircraft trailing each other. Traffic is reordered slightly to take advantage of this spacing criterion, thus shortening the groups and reducing average delays. For heavy traffic, delays for different traffic samples vary widely, even when the same set of statistical parameters is used to produce each sample. This report supersedes NASA TM-102795 on the same subject. It includes a new method of time-advance as well as an efficient method of sequencing and spacing for two dependent runways.
Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E
1985-01-01
The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
Luo, Li; Zhu, Yun
2012-01-01
Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
Luo, Li; Zhu, Yun; Xiong, Momiao
2012-06-01
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Gong, Xiangli; Li, Juntao; Zhang, Ying; Hou, Shuiping; Qu, Pinghua; Yang, Zhicong; Chen, Shouyi
2017-08-01
Legionella spp. are important waterborne pathogens. Molecular typing has become an important method for outbreaks investigations and source tracking of Legionnaires. In a survey program conducted by the Guangzhou Center for Disease Control and Prevention, multiple serotypes Legionella pneumophila (L. pneumophila) were isolated from waters in air-conditioning cooling towers in urban Guangzhou region, China between 2008 and 2011. Three genotyping methods, mip (macrophage infectivity potentiator) genotyping, SBT (sequence-based typing), and FAFLP (fluorescent amplified fragment length polymorphism analysis) were used to type these waterborne L. pneumophila isolates. The three methods were capable of typing all the 134 isolates and a reference strain of L. pneumophila (ATCC33153), with discriminatory indices of 0.7034, 0.9218, and 0.9376, for the mip, SBT, and FAFLP methods respectively. Among the 9 serotypes of the 134 isolates, 10, 50, and 34 molecular types were detected by the mip, SBT, and FAFLP methods respectively. The mip genotyping and SBT typing are more feasible for inter-laboratory results sharing and comparison of different types of L. pneumophila. The SBT and FAFLP typing methods were rapid with higher discriminatory abilities. Combinations of two or more of the typing methods enables more accurate typing of Legionella isolates for outbreak investigations and source tracking of Legionnaires. Copyright © 2017 Elsevier B.V. All rights reserved.
Sequence of Child Care Type and Child Development: What Role Does Peer Exposure Play?
ERIC Educational Resources Information Center
Morrissey, Taryn W.
2010-01-01
Child care arrangements change as children age; in general, hours in home-based child care decrease as hours in center-based settings increase. This sequence of child care type may correspond with children's developmental needs; the small peer groups and low child-adult ratios typical of home-based care may allow for more individual child-adult…
Movie denoising by average of warped lines.
Bertalmío, Marcelo; Caselles, Vicent; Pardo, Alvaro
2007-09-01
Here, we present an efficient method for movie denoising that does not require any motion estimation. The method is based on the well-known fact that averaging several realizations of a random variable reduces the variance. For each pixel to be denoised, we look for close similar samples along the level surface passing through it. With these similar samples, we estimate the denoised pixel. The method to find close similar samples is done via warping lines in spatiotemporal neighborhoods. For that end, we present an algorithm based on a method for epipolar line matching in stereo pairs which has per-line complexity O (N), where N is the number of columns in the image. In this way, when applied to the image sequence, our algorithm is computationally efficient, having a complexity of the order of the total number of pixels. Furthermore, we show that the presented method is unsupervised and is adapted to denoise image sequences with an additive white noise while respecting the visual details on the movie frames. We have also experimented with other types of noise with satisfactory results.
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M
2018-02-14
Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.
Cohort analysis of a single nucleotide polymorphism on DNA chips.
Schwonbeck, Susanne; Krause-Griep, Andrea; Gajovic-Eichelmann, Nenad; Ehrentreich-Förster, Eva; Meinl, Walter; Glatt, Hansrüdi; Bier, Frank F
2004-11-15
A method has been developed to determine SNPs on DNA chips by applying a flow-through bioscanner. As a practical application we demonstrated the fast and simple SNP analysis of 24 genotypes in an array of 96 spots with a single hybridisation and dissociation experiment. The main advantage of this methodical concept is the parallel and fast analysis without any need of enzymatic digestion. Additionally, the DNA chip format used is appropriate for parallel analysis up to 400 spots. The polymorphism in the gene of the human phenol sulfotransferase SULT1A1 was studied as a model SNP. Biotinylated PCR products containing the SNP (The SNP summary web site: ) (mutant) and those containing no mutation (wild-type) were brought onto the chips coated with NeutrAvidin using non-contact spotting. This was followed by an analysis which was carried out in a flow-through biochip scanner while constantly rinsing with buffer. After removing the non-biotinylated strand a fluorescent probe was hybridised, which is complementary to the wild-type sequence. If this probe binds to a mutant sequence, then one single base is not fully matching. Thereby, the mismatched hybrid (mutant) is less stable than the full-matched hybrid (wild-type). The final step after hybridisation on the chip involves rinsing with a buffer to start dissociation of the fluorescent probe from the immobilised DNA strand. The online measurement of the fluorescence intensity by the biochip scanner provides the possibility to follow the kinetics of the hybridisation and dissociation processes. According to the different stability of the full-match and the mismatch, either visual discrimination or kinetic analysis is possible to distinguish SNP-containing sequence from the wild-type sequence.
2012-01-01
Background Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods. Results This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble. Conclusions The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role. Availability The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/. PMID:23282103
Towards fast and accurate temperature mapping with proton resonance frequency-based MR thermometry
Yuan, Jing; Mei, Chang-Sheng; Panych, Lawrence P.; McDannold, Nathan J.; Madore, Bruno
2012-01-01
The capability to image temperature is a very attractive feature of MRI and has been actively exploited for guiding minimally-invasive thermal therapies. Among many MR-based temperature-sensitive approaches, proton resonance frequency (PRF) thermometry provides the advantage of excellent linearity of signal with temperature over a large temperature range. Furthermore, the PRF shift has been shown to be fairly independent of tissue type and thermal history. For these reasons, PRF method has evolved into the most widely used MR-based thermometry method. In the present paper, the basic principles of PRF-based temperature mapping will be reviewed, along with associated pulse sequence designs. Technical advancements aimed at increasing the imaging speed and/or temperature accuracy of PRF-based thermometry sequences, such as image acceleration, fat suppression, reduced field-of-view imaging, as well as motion tracking and correction, will be discussed. The development of accurate MR thermometry methods applicable to moving organs with non-negligible fat content represents a very challenging goal, but recent developments suggest that this goal may be achieved. If so, MR-guided thermal therapies may be expected to play an increasingly-important therapeutic and palliative role, as a minimally-invasive alternative to surgery. PMID:22773966
Bhowmick, P P; Khushiramani, R; Raghunath, P; Karunasagar, I; Karunasagar, I
2008-02-01
Evaluation of protein profiling for typing Vibrio parahaemolyticus using 71 strains isolated from different seafood and comparison with other molecular typing techniques such as random amplified polymorphic DNA analysis (RAPD) and enterobacterial repetitive intergenic consensus sequence (ERIC)-PCR. Three molecular typing methods were used for the typing of 71 V. parahaemolyticus isolates from seafood. RAPD had a discriminatory index (DI) of 0.95, while ERIC-PCR showed a DI of 0.94. Though protein profiling had less discriminatory power, use of this method can be helpful in identifying new proteins which might have a role in establishment in the host or virulence of the organism. The use of protein profiling in combination with other established typing methods such as RAPD and ERIC-PCR generates useful information in the case of V. parahaemolyticus associated with seafood. The study demonstrates the usefulness of nucleic acid and protein-based studies in understanding the relationship between various isolates from seafood.
Rodriguez, Marcela; Hogan, Patrick G; Satola, Sarah W; Crispell, Emily; Wylie, Todd; Gao, Hongyu; Sodergren, Erica; Weinstock, George M; Burnham, Carey-Ann D; Fritz, Stephanie A
2015-09-01
Historically, a number of typing methods have been evaluated for Staphylococcus aureus strain characterization. The emergence of contemporary strains of community-associated S. aureus, and the ensuing epidemic with a predominant strain type (USA300), necessitates re-evaluation of the discriminatory power of these typing methods for discerning molecular epidemiology and transmission dynamics, essential to investigations of hospital and community outbreaks. We compared the discriminatory index of 5 typing methods for contemporary S. aureus strain characterization. Children presenting to St. Louis Children's Hospital and community pediatric practices in St. Louis, Missouri (MO), with community-associated S. aureus infections were enrolled. Repetitive sequence-based PCR (repPCR), pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), staphylococcal protein A (spa), and staphylococcal cassette chromosome (SCC) mec typing were performed on 200 S. aureus isolates. The discriminatory index of each method was calculated using the standard formula for this metric, where a value of 1 is highly discriminatory and a value of 0 is not discriminatory. Overall, we identified 26 distinct strain types by repPCR, 17 strain types by PFGE, 30 strain types by MLST, 68 strain types by spa typing, and 5 strain types by SCCmec typing. RepPCR had the highest discriminatory index (D) of all methods (D = 0.88), followed by spa typing (D = 0.87), MLST (D = 0.84), PFGE (D = 0.76), and SCCmec typing (D = 0.60). The method with the highest D among MRSA isolates was repPCR (D = 0.64) followed by spa typing (D = 0.45) and MLST (D = 0.44). The method with the highest D among MSSA isolates was spa typing (D = 0.98), followed by MLST (D = 0.93), repPCR (D = 0.92), and PFGE (D = 0.89). Among isolates designated USA300 by PFGE, repPCR was most discriminatory, with 10 distinct strain types identified (D = 0.63). We identified 45 MRSA isolates which were classified as identical by PFGE, MLST, spa typing, and SCCmec typing (USA300, ST8, t008, SCCmec IV, respectively); within this collection, there were 5 distinct strain types identified by repPCR. The typing methods yielded comparable discriminatory power for S. aureus characterization overall; when discriminating among USA300 isolates, repPCR retained the highest discriminatory power. This property is advantageous for investigations conducted in the era of contemporary S. aureus infections.
Rodriguez, Marcela; Hogan, Patrick G.; Satola, Sarah W.; Crispell, Emily; Wylie, Todd; Gao, Hongyu; Sodergren, Erica; Weinstock, George M.; Burnham, Carey-Ann D.; Fritz, Stephanie A.
2015-01-01
Abstract Historically, a number of typing methods have been evaluated for Staphylococcus aureus strain characterization. The emergence of contemporary strains of community-associated S. aureus, and the ensuing epidemic with a predominant strain type (USA300), necessitates re-evaluation of the discriminatory power of these typing methods for discerning molecular epidemiology and transmission dynamics, essential to investigations of hospital and community outbreaks. We compared the discriminatory index of 5 typing methods for contemporary S. aureus strain characterization. Children presenting to St. Louis Children's Hospital and community pediatric practices in St. Louis, Missouri (MO), with community-associated S. aureus infections were enrolled. Repetitive sequence-based PCR (repPCR), pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), staphylococcal protein A (spa), and staphylococcal cassette chromosome (SCC) mec typing were performed on 200 S. aureus isolates. The discriminatory index of each method was calculated using the standard formula for this metric, where a value of 1 is highly discriminatory and a value of 0 is not discriminatory. Overall, we identified 26 distinct strain types by repPCR, 17 strain types by PFGE, 30 strain types by MLST, 68 strain types by spa typing, and 5 strain types by SCCmec typing. RepPCR had the highest discriminatory index (D) of all methods (D = 0.88), followed by spa typing (D = 0.87), MLST (D = 0.84), PFGE (D = 0.76), and SCCmec typing (D = 0.60). The method with the highest D among MRSA isolates was repPCR (D = 0.64) followed by spa typing (D = 0.45) and MLST (D = 0.44). The method with the highest D among MSSA isolates was spa typing (D = 0.98), followed by MLST (D = 0.93), repPCR (D = 0.92), and PFGE (D = 0.89). Among isolates designated USA300 by PFGE, repPCR was most discriminatory, with 10 distinct strain types identified (D = 0.63). We identified 45 MRSA isolates which were classified as identical by PFGE, MLST, spa typing, and SCCmec typing (USA300, ST8, t008, SCCmec IV, respectively); within this collection, there were 5 distinct strain types identified by repPCR. The typing methods yielded comparable discriminatory power for S. aureus characterization overall; when discriminating among USA300 isolates, repPCR retained the highest discriminatory power. This property is advantageous for investigations conducted in the era of contemporary S. aureus infections. PMID:26376402
Organizational heterogeneity of vertebrate genomes.
Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham
2012-01-01
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Effect of base sequence on the DNA cross-linking properties of pyrrolobenzodiazepine (PBD) dimers
Rahman, Khondaker M.; James, Colin H.; Thurston, David E.
2011-01-01
Pyrrolo[2,1-c][1,4]benzodiazepine (PBD) dimers are synthetic sequence-selective DNA minor-groove cross-linking agents that possess two electrophilic imine moieties (or their equivalent) capable of forming covalent aminal linkages with guanine C2-NH2 functionalities. The PBD dimer SJG-136, which has a C8–O–(CH2)3–O–C8′′ central linker joining the two PBD moieties, is currently undergoing phase II clinical trials and current research is focused on developing analogues of SJG-136 with different linker lengths and substitution patterns. Using a reversed-phase ion pair HPLC/MS method to evaluate interaction with oligonucleotides of varying length and sequence, we recently reported (JACS, 2009, 131, 13 756) that SJG-136 can form three different types of adducts: inter- and intrastrand cross-linked adducts, and mono-alkylated adducts. These studies have now been extended to include PBD dimers with a longer central linker (C8–O–(CH2)5–O–C8′), demonstrating that the type and distribution of adducts appear to depend on (i) the length of the C8/C8′-linker connecting the two PBD units, (ii) the positioning of the two reactive guanine bases on the same or opposite strands, and (iii) their separation (i.e. the number of base pairs, usually ATs, between them). Based on these data, a set of rules are emerging that can be used to predict the DNA–interaction behaviour of a PBD dimer of particular C8–C8′ linker length towards a given DNA sequence. These observations suggest that it may be possible to design PBD dimers to target specific DNA sequences. PMID:21427082
Wang, Penghao; Wilson, Susan R
2013-01-01
Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
Metabolic network prediction through pairwise rational kernels.
Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian
2014-09-26
Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.
Robbins, Marjorie; Judge, Adam; MacLachlan, Ian
2009-06-01
Canonical small interfering RNA (siRNA) duplexes are potent activators of the mammalian innate immune system. The induction of innate immunity by siRNA is dependent on siRNA structure and sequence, method of delivery, and cell type. Synthetic siRNA in delivery vehicles that facilitate cellular uptake can induce high levels of inflammatory cytokines and interferons after systemic administration in mammals and in primary human blood cell cultures. This activation is predominantly mediated by immune cells, normally via a Toll-like receptor (TLR) pathway. The siRNA sequence dependency of these pathways varies with the type and location of the TLR involved. Alternatively nonimmune cell activation may also occur, typically resulting from siRNA interaction with cytoplasmic RNA sensors such as RIG1. As immune activation by siRNA-based drugs represents an undesirable side effect due to the considerable toxicities associated with excessive cytokine release in humans, understanding and abrogating this activity will be a critical component in the development of safe and effective therapeutics. This review describes the intracellular mechanisms of innate immune activation by siRNA, the design of appropriate sequences and chemical modification approaches, and suitable experimental methods for studying their effects, with a view toward reducing siRNA-mediated off-target effects.
Yan, Yong-Wei; Zou, Bin; Zhu, Ting; Hozzein, Wael N.
2017-01-01
RNA-seq-based SSU (small subunit) rRNA (ribosomal RNA) analysis has provided a better understanding of potentially active microbial community within environments. However, for RNA-seq library construction, high quantities of purified RNA are typically required. We propose a modified RNA-seq method for SSU rRNA-based microbial community analysis that depends on the direct ligation of a 5’ adaptor to RNA before reverse-transcription. The method requires only a low-input quantity of RNA (10–100 ng) and does not require a DNA removal step. The method was initially tested on three mock communities synthesized with enriched SSU rRNA of archaeal, bacterial and fungal isolates at different ratios, and was subsequently used for environmental samples of high or low biomass. For high-biomass salt-marsh sediments, enriched SSU rRNA and total nucleic acid-derived RNA-seq datasets revealed highly consistent community compositions for all of the SSU rRNA sequences, and as much as 46.4%-59.5% of 16S rRNA sequences were suitable for OTU (operational taxonomic unit)-based community and diversity analyses with complete coverage of V1-V2 regions. OTU-based community structures for the two datasets were also highly consistent with those determined by all of the 16S rRNA reads. For low-biomass samples, total nucleic acid-derived RNA-seq datasets were analyzed, and highly active bacterial taxa were also identified by the OTU-based method, notably including members of the previously underestimated genus Nitrospira and phylum Acidobacteria in tap water, members of the phylum Actinobacteria on a shower curtain, and members of the phylum Cyanobacteria on leaf surfaces. More than half of the bacterial 16S rRNA sequences covered the complete region of primer 8F, and non-coverage rates as high as 38.7% were obtained for phylum-unclassified sequences, providing many opportunities to identify novel bacterial taxa. This modified RNA-seq method will provide a better snapshot of diverse microbial communities, most notably by OTU-based analysis, even communities with low-biomass samples. PMID:29016661
Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won
2011-09-01
Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.
van Doorn, J.; Hollinger, T. C.; Oudega, B.
2001-01-01
A sensitive and specific detection method was developed for Xanthomonas hyacinthi; this method was based on amplification of a subsequence of the type IV fimbrial-subunit gene fimA from strain S148. The fimA gene was amplified by PCR with degenerate DNA primers designed by using the N-terminal and C-terminal amino acid sequences of trypsin fragments of FimA. The nucleotide sequence of fimA was determined and compared with the nucleotide sequences coding for the fimbrial subunits in other type IV fimbria-producing bacteria, such as Xanthomonas campestris pv. vesicatoria, Neisseria gonorrhoeae, and Moraxella bovis. In a PCR internal primers JAAN and JARA, designed by using the nucleotide sequences of the variable central and C-terminal region of fimA, amplified a 226-bp DNA fragment in all X. hyacinthi isolates. This PCR was shown to be pathovar specific, as assessed by testing 71 Xanthomonas pathovars and bacterial isolates belonging to other genera, such as Erwinia and Pseudomonas. Southern hybridization experiments performed with the labelled 226-bp DNA amplicon as a probe suggested that there is only one structural type IV fimbrial-gene cluster in X. hyacinthi. Only two Xanthomonas translucens pathovars cross-reacted weakly in PCR. Primers amplifying a subsequence of the fimA gene of X. campestris pv. vesicatoria (T. Ojanen-Reuhs, N. Kalkkinen, B. Westerlund-Wikström, J. van Doorn, K. Haahtela, E.-L. Nurmiaho-Lassila, K. Wengelink, U. Bonas, and T. K. Korhonen, J. Bacteriol. 179: 1280–1290, 1997) were shown to be pathovar specific, indicating that the fimbrial-subunit sequences are more generally applicable in xanthomonads for detection purposes. Under laboratory conditions, approximately 1,000 CFU of X. hyacinthi per ml could be detected. In inoculated leaves of hyacinths the threshold was 5,000 CFU/ml. The results indicated that infected hyacinths with early symptoms could be successfully screened for X. hyacinthi with PCR. PMID:11157222
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H
2017-01-01
Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Soler, Miguel A; de Marco, Ario; Fortuna, Sara
2016-10-10
Nanobodies (VHHs) have proved to be valuable substitutes of conventional antibodies for molecular recognition. Their small size represents a precious advantage for rational mutagenesis based on modelling. Here we address the problem of predicting how Camelidae nanobody sequences can tolerate mutations by developing a simulation protocol based on all-atom molecular dynamics and whole-molecule docking. The method was tested on two sets of nanobodies characterized experimentally for their biophysical features. One set contained point mutations introduced to humanize a wild type sequence, in the second the CDRs were swapped between single-domain frameworks with Camelidae and human hallmarks. The method resulted in accurate scoring approaches to predict experimental yields and enabled to identify the structural modifications induced by mutations. This work is a promising tool for the in silico development of single-domain antibodies and opens the opportunity to customize single functional domains of larger macromolecules.
NASA Astrophysics Data System (ADS)
Soler, Miguel A.; De Marco, Ario; Fortuna, Sara
2016-10-01
Nanobodies (VHHs) have proved to be valuable substitutes of conventional antibodies for molecular recognition. Their small size represents a precious advantage for rational mutagenesis based on modelling. Here we address the problem of predicting how Camelidae nanobody sequences can tolerate mutations by developing a simulation protocol based on all-atom molecular dynamics and whole-molecule docking. The method was tested on two sets of nanobodies characterized experimentally for their biophysical features. One set contained point mutations introduced to humanize a wild type sequence, in the second the CDRs were swapped between single-domain frameworks with Camelidae and human hallmarks. The method resulted in accurate scoring approaches to predict experimental yields and enabled to identify the structural modifications induced by mutations. This work is a promising tool for the in silico development of single-domain antibodies and opens the opportunity to customize single functional domains of larger macromolecules.
2010-01-01
Background Canine distemper virus (CDV) is present worldwide and produces a lethal systemic infection of wild and domestic Canidae. Pre-existing antibodies acquired from vaccination or previous CDV infection might interfere the interpretation of a serologic diagnosis method. In addition, due to the high similarity of nucleic acid sequences between wild-type CDV and the new vaccine strain, current PCR derived methods cannot be applied for the definite confirmation of CD infection. Hence, it is worthy of developing a simple and rapid nucleotide-based assay for differentiation of wild-type CDV which is a cause of disease from attenuated CDVs after vaccination. High frequency variations have been found in the region spanning from the 3'-untranslated region (UTR) of the matrix (M) gene to the fusion (F) gene (designated M-F UTR) in a few CDV strains. To establish a differential diagnosis assay, an amplification refractory mutation analysis was established based on the highly variable region on M-F UTR and F regions. Results Sequences of frequent polymorphisms were found scattered throughout the M-F UTR region; the identity of nucleic acid between local strains and vaccine strains ranged from 82.5% to 93.8%. A track of AAA residue located 35 nucleotides downstream from F gene start codon highly conserved in three vaccine strains were replaced with TGC in the local strains; that severed as target sequences for deign of discrimination primers. The method established in the present study successfully differentiated seven Taiwanese CDV field isolates, all belonging to the Asia-1 lineage, from vaccine strains. Conclusions The method described herein would be useful for several clinical applications, such as confirmation of nature CDV infection, evaluation of vaccination status and verification of the circulating viral genotypes. PMID:20534175
Chulakasian, Songkhla; Lee, Min-Shiuh; Wang, Chi-Young; Chiou, Shyan-Song; Lin, Kuan-Hsun; Lin, Fong-Yuan; Hsu, Tien-Huan; Wong, Min-Liang; Chang, Tien-Jye; Hsu, Wei-Li
2010-06-10
Canine distemper virus (CDV) is present worldwide and produces a lethal systemic infection of wild and domestic Canidae. Pre-existing antibodies acquired from vaccination or previous CDV infection might interfere the interpretation of a serologic diagnosis method. In addition, due to the high similarity of nucleic acid sequences between wild-type CDV and the new vaccine strain, current PCR derived methods cannot be applied for the definite confirmation of CD infection. Hence, it is worthy of developing a simple and rapid nucleotide-based assay for differentiation of wild-type CDV which is a cause of disease from attenuated CDVs after vaccination. High frequency variations have been found in the region spanning from the 3'-untranslated region (UTR) of the matrix (M) gene to the fusion (F) gene (designated M-F UTR) in a few CDV strains. To establish a differential diagnosis assay, an amplification refractory mutation analysis was established based on the highly variable region on M-F UTR and F regions. Sequences of frequent polymorphisms were found scattered throughout the M-F UTR region; the identity of nucleic acid between local strains and vaccine strains ranged from 82.5% to 93.8%. A track of AAA residue located 35 nucleotides downstream from F gene start codon highly conserved in three vaccine strains were replaced with TGC in the local strains; that severed as target sequences for deign of discrimination primers. The method established in the present study successfully differentiated seven Taiwanese CDV field isolates, all belonging to the Asia-1 lineage, from vaccine strains. The method described herein would be useful for several clinical applications, such as confirmation of nature CDV infection, evaluation of vaccination status and verification of the circulating viral genotypes.
A core microbiome associated with the peritoneal tumors of pseudomyxoma peritonei
2013-01-01
Background Pseudomyxoma peritonei (PMP) is a malignancy characterized by dissemination of mucus-secreting cells throughout the peritoneum. This disease is associated with significant morbidity and mortality and despite effective treatment options for early-stage disease, patients with PMP often relapse. Thus, there is a need for additional treatment options to reduce relapse rate and increase long-term survival. A previous study identified the presence of both typed and non-culturable bacteria associated with PMP tissue and determined that increased bacterial density was associated with more severe disease. These findings highlighted the possible role for bacteria in PMP disease. Methods To more clearly define the bacterial communities associated with PMP disease, we employed a sequenced-based analysis to profile the bacterial populations found in PMP tumor and mucin tissue in 11 patients. Sequencing data were confirmed by in situ hybridization at multiple taxonomic depths and by culturing. A pilot clinical study was initiated to determine whether the addition of antibiotic therapy affected PMP patient outcome. Main results We determined that the types of bacteria present are highly conserved in all PMP patients; the dominant phyla are the Proteobacteria, Actinobacteria, Firmicutes and Bacteroidetes. A core set of taxon-specific sequences were found in all 11 patients; many of these sequences were classified into taxonomic groups that also contain known human pathogens. In situ hybridization directly confirmed the presence of bacteria in PMP at multiple taxonomic depths and supported our sequence-based analysis. Furthermore, culturing of PMP tissue samples allowed us to isolate 11 different bacterial strains from eight independent patients, and in vitro analysis of subset of these isolates suggests that at least some of these strains may interact with the PMP-associated mucin MUC2. Finally, we provide evidence suggesting that targeting these bacteria with antibiotic treatment may increase the survival of PMP patients. Conclusions Using 16S amplicon-based sequencing, direct in situ hybridization analysis and culturing methods, we have identified numerous bacterial taxa that are consistently present in all PMP patients tested. Combined with data from a pilot clinical study, these data support the hypothesis that adding antimicrobials to the standard PMP treatment could improve PMP patient survival. PMID:23844722
Ramírez, Juan C; Torres, Carolina; Curto, María de Los A; Schijman, Alejandro G
2017-12-01
Trypanosoma cruzi has been subdivided into seven Discrete Typing Units (DTUs), TcI-TcVI and Tcbat. Two major evolutionary models have been proposed to explain the origin of hybrid lineages, but while it is widely accepted that TcV and TcVI are the result of genetic exchange between TcII and TcIII strains, the origin of TcIII and TcIV is still a matter of debate. T. cruzi satellite DNA (SatDNA), comprised of 195 bp units organized in tandem repeats, from both TcV and TcVI stocks were found to have SatDNA copies type TcI and TcII; whereas contradictory results were observed for TcIII stocks and no TcIV sequence has been analyzed yet. Herein, we have gone deeper into this matter analyzing 335 distinct SatDNA sequences from 19 T. cruzi stocks representative of DTUs TcI-TcVI for phylogenetic inference. Bayesian phylogenetic tree showed that all sequences were grouped in three major clusters, which corresponded to sequences from DTUs TcI/III, TcII and TcIV; whereas TcV and TcVI stocks had two sets of sequences distributed into TcI/III and TcII clusters. As expected, the lowest genetic distances were found between TcI and TcIII, and between TcV and TcVI sequences; whereas the highest ones were observed between TcII and TcI/III, and among TcIV sequences and those from the remaining DTUs. In addition, signature patterns associated to specific T. cruzi lineages were identified and new primers that improved SatDNA-based qPCR sensitivity were designed. Our findings support the theory that TcIII is not the result of a hybridization event between TcI and TcII, and that TcIV had an independent origin from the other DTUs, contributing to clarifying the evolutionary history of T. cruzi lineages. Moreover, this work opens the possibility of typing samples from Chagas disease patients with low parasitic loads and improving molecular diagnostic methods of T. cruzi infection based on SatDNA sequence amplification.
Diaz, Maureen H; Winchell, Jonas M
2016-01-01
Over the past decade there have been significant advancements in the methods used for detecting and characterizing Mycoplasma pneumoniae, a common cause of respiratory illness and community-acquired pneumonia worldwide. The repertoire of available molecular diagnostics has greatly expanded from nucleic acid amplification techniques (NAATs) that encompass a variety of chemistries used for detection, to more sophisticated characterizing methods such as multi-locus variable-number tandem-repeat analysis (MLVA), Multi-locus sequence typing (MLST), matrix-assisted laser desorption ionization-time-of-flight mass spectrometry (MALDI-TOF MS), single nucleotide polymorphism typing, and numerous macrolide susceptibility profiling methods, among others. These many molecular-based approaches have been developed and employed to continually increase the level of discrimination and characterization in order to better understand the epidemiology and biology of M. pneumoniae. This review will summarize recent molecular techniques and procedures and lend perspective to how each has enhanced the current understanding of this organism and will emphasize how Next Generation Sequencing may serve as a resource for researchers to gain a more comprehensive understanding of the genomic complexities of this insidious pathogen.
Phylogenetic and environmental diversity of DsrAB-type dissimilatory (bi)sulfite reductases
Müller, Albert Leopold; Kjeldsen, Kasper Urup; Rattei, Thomas; Pester, Michael; Loy, Alexander
2015-01-01
The energy metabolism of essential microbial guilds in the biogeochemical sulfur cycle is based on a DsrAB-type dissimilatory (bi)sulfite reductase that either catalyzes the reduction of sulfite to sulfide during anaerobic respiration of sulfate, sulfite and organosulfonates, or acts in reverse during sulfur oxidation. Common use of dsrAB as a functional marker showed that dsrAB richness in many environments is dominated by novel sequence variants and collectively represents an extensive, largely uncharted sequence assemblage. Here, we established a comprehensive, manually curated dsrAB/DsrAB database and used it to categorize the known dsrAB diversity, reanalyze the evolutionary history of dsrAB and evaluate the coverage of published dsrAB-targeted primers. Based on a DsrAB consensus phylogeny, we introduce an operational classification system for environmental dsrAB sequences that integrates established taxonomic groups with operational taxonomic units (OTUs) at multiple phylogenetic levels, ranging from DsrAB enzyme families that reflect reductive or oxidative DsrAB types of bacterial or archaeal origin, superclusters, uncultured family-level lineages to species-level OTUs. Environmental dsrAB sequences constituted at least 13 stable family-level lineages without any cultivated representatives, suggesting that major taxa of sulfite/sulfate-reducing microorganisms have not yet been identified. Three of these uncultured lineages occur mainly in marine environments, while specific habitat preferences are not evident for members of the other 10 uncultured lineages. In summary, our publically available dsrAB/DsrAB database, the phylogenetic framework, the multilevel classification system and a set of recommended primers provide a necessary foundation for large-scale dsrAB ecology studies with next-generation sequencing methods. PMID:25343514
[Standard algorithm of molecular typing of Yersinia pestis strains].
Eroshenko, G A; Odinokov, G N; Kukleva, L M; Pavlova, A I; Krasnov, Ia M; Shavina, N Iu; Guseva, N P; Vinogradova, N A; Kutyrev, V V
2012-01-01
Development of the standard algorithm of molecular typing of Yersinia pestis that ensures establishing of subspecies, biovar and focus membership of the studied isolate. Determination of the characteristic strain genotypes of plague infectious agent of main and nonmain subspecies from various natural foci of plague of the Russian Federation and the near abroad. Genotyping of 192 natural Y. pestis strains of main and nonmain subspecies was performed by using PCR methods, multilocus sequencing and multilocus analysis of variable tandem repeat number. A standard algorithm of molecular typing of plague infectious agent including several stages of Yersinia pestis differentiation by membership: in main and nonmain subspecies, various biovars of the main subspecies, specific subspecies; natural foci and geographic territories was developed. The algorithm is based on 3 typing methods--PCR, multilocus sequence typing and multilocus analysis of variable tandem repeat number using standard DNA targets--life support genes (terC, ilvN, inv, glpD, napA, rhaS and araC) and 7 loci of variable tandem repeats (ms01, ms04, ms06, ms07, ms46, ms62, ms70). The effectiveness of the developed algorithm is shown on the large number of natural Y. pestis strains. Characteristic sequence types of Y. pestis strains of various subspecies and biovars as well as MLVA7 genotypes of strains from natural foci of plague of the Russian Federation and the near abroad were established. The application of the developed algorithm will increase the effectiveness of epidemiologic monitoring of plague infectious agent, and analysis of epidemics and outbreaks of plague with establishing the source of origin of the strain and routes of introduction of the infection.
Applications of alignment-free methods in epigenomics.
Pinello, Luca; Lo Bosco, Giosuè; Yuan, Guo-Cheng
2014-05-01
Epigenetic mechanisms play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have supported a role of DNA sequences in recruitment of epigenetic regulators. Alignment-free methods have been applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. Here, we review recent advances in such applications, including the methods to map DNA sequence to feature space, sequence comparison and prediction models. Computational studies using these methods have provided important insights into the epigenetic regulatory mechanisms.
Reimer, Aleisha; Verghese, Bindhu; Lok, Mei; Ziegler, Jennifer; Farber, Jeffrey; Pagotto, Franco; Graham, Morag; Nadon, Celine A.
2012-01-01
Human listeriosis outbreaks in Canada have been predominantly caused by serotype 1/2a isolates with highly similar pulsed-field gel electrophoresis (PFGE) patterns. Multilocus sequence typing (MLST) and multi-virulence-locus sequence typing (MVLST) each identified a diverse population of Listeria monocytogenes isolates, and within that, both methods had congruent subtypes that substantiated a predominant clone (clonal complex 8; virulence type 59; proposed epidemic clone 5 [ECV]) that has been causing human illness across Canada for more than 2 decades. PMID:22337989
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
CRISPR Diversity and Microevolution in Clostridium difficile.
Andersen, Joakim M; Shoup, Madelyn; Robinson, Cathy; Britton, Robert; Olsen, Katharina E P; Barrangou, Rodolphe
2016-09-19
Virulent strains of Clostridium difficile have become a global health problem associated with morbidity and mortality. Traditional typing methods do not provide ideal resolution to track outbreak strains, ascertain genetic diversity between isolates, or monitor the phylogeny of this species on a global basis. Here, we investigate the occurrence and diversity of clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (cas) in C. difficile to assess the potential of CRISPR-based phylogeny and high-resolution genotyping. A single Type-IB CRISPR-Cas system was identified in 217 analyzed genomes with cas gene clusters present at conserved chromosomal locations, suggesting vertical evolution of the system, assessing a total of 1,865 CRISPR arrays. The CRISPR arrays, markedly enriched (8.5 arrays/genome) compared with other species, occur both at conserved and variable locations across strains, and thus provide a basis for typing based on locus occurrence and spacer polymorphism. Clustering of strains by array composition correlated with sequence type (ST) analysis. Spacer content and polymorphism within conserved CRISPR arrays revealed phylogenetic relationship across clades and within ST. Spacer polymorphisms of conserved arrays were instrumental for differentiating closely related strains, e.g., ST1/RT027/B1 strains and pathogenicity locus encoding ST3/RT001 strains. CRISPR spacers showed sequence similarity to phage sequences, which is consistent with the native role of CRISPR-Cas as adaptive immune systems in bacteria. Overall, CRISPR-Cas sequences constitute a valuable basis for genotyping of C. difficile isolates, provide insights into the micro-evolutionary events that occur between closely related strains, and reflect the evolutionary trajectory of these genomes. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome-Wide Profiling of DNA Double-Strand Breaks by the BLESS and BLISS Methods.
Mirzazadeh, Reza; Kallas, Tomasz; Bienko, Magda; Crosetto, Nicola
2018-01-01
DNA double-strand breaks (DSBs) are major DNA lesions that are constantly formed during physiological processes such as DNA replication, transcription, and recombination, or as a result of exogenous agents such as ionizing radiation, radiomimetic drugs, and genome editing nucleases. Unrepaired DSBs threaten genomic stability by leading to the formation of potentially oncogenic rearrangements such as translocations. In past few years, several methods based on next-generation sequencing (NGS) have been developed to study the genome-wide distribution of DSBs or their conversion to translocation events. We developed Breaks Labeling, Enrichment on Streptavidin, and Sequencing (BLESS), which was the first method for direct labeling of DSBs in situ followed by their genome-wide mapping at nucleotide resolution (Crosetto et al., Nat Methods 10:361-365, 2013). Recently, we have further expanded the quantitative nature, applicability, and scalability of BLESS by developing Breaks Labeling In Situ and Sequencing (BLISS) (Yan et al., Nat Commun 8:15058, 2017). Here, we first present an overview of existing methods for genome-wide localization of DSBs, and then focus on the BLESS and BLISS methods, discussing different assay design options depending on the sample type and application.
mec-associated dru typing in the epidemiological analysis of ST239 MRSA in Malaysia.
Ghaznavi-Rad, E; Goering, R V; Nor Shamsudin, M; Weng, P L; Sekawi, Z; Tavakol, M; van Belkum, A; Neela, V
2011-11-01
The usefulness of mec-associated dru typing in the epidemiological analysis of methicillin-resistant Staphylococcus aureus (MRSA) isolated in Malaysia was investigated and compared with pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), and spa and SCCmec typing. The isolates studied included all MRSA types in Malaysia. Multilocus sequence type ST188 and ST1 isolates were highly clonal by all typing methods. However, the dru typing of ST239 isolates produced the clearest discrimination between SCCmec IIIa and III isolates, yielding more subtypes than any other method. Evaluation of the discriminatory power for each method identified dru typing and PFGE as the most discriminatory, with Simpson's index of diversity (SID) values over 89%, including an isolate which was non-typeable by spa, but dru-typed as dt13j. The discriminatory ability of dru typing, especially with closely related MRSA ST239 strains (e.g., Brazilian and Hungarian), underscores its utility as a tool for the epidemiological investigation of MRSA.
Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min
2012-01-01
DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
A Pan-HIV Strategy for Complete Genome Sequencing
Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.
2015-01-01
Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702
Variable Number Of Tandem Repeats (VNTR) and its application in bacterial epidemiology.
Ramazanzadeh, Rashid; McNerney, Ruth
2007-08-15
Molecular epidemiology is the using of molecular techniques to study bacterial distribution in human populations. Recently molecular epidemiologist benefit from several techniques such as Variable Number Tandem Repeat (VNTR) typing method to typing bacterial strains. Variable Number Tandem Repeat (VNTR) typing is a tool for genotyping and provides data in a simple and numeric format based on the number of repetitive sequences. VNTR for first time identified in M. tuberculosis as Mycobacterial Interspersed Repeat Units (MIRUs). General terms of VNTR have now been reported in Bacillus anthracis, Legionella pneumophila, Pseudomonas aeruginosa, Salmonella enterica and Escherichia coli O157.
A state-based probabilistic model for tumor respiratory motion prediction
NASA Astrophysics Data System (ADS)
Kalet, Alan; Sandison, George; Wu, Huanmei; Schmitz, Ruth
2010-12-01
This work proposes a new probabilistic mathematical model for predicting tumor motion and position based on a finite state representation using the natural breathing states of exhale, inhale and end of exhale. Tumor motion was broken down into linear breathing states and sequences of states. Breathing state sequences and the observables representing those sequences were analyzed using a hidden Markov model (HMM) to predict the future sequences and new observables. Velocities and other parameters were clustered using a k-means clustering algorithm to associate each state with a set of observables such that a prediction of state also enables a prediction of tumor velocity. A time average model with predictions based on average past state lengths was also computed. State sequences which are known a priori to fit the data were fed into the HMM algorithm to set a theoretical limit of the predictive power of the model. The effectiveness of the presented probabilistic model has been evaluated for gated radiation therapy based on previously tracked tumor motion in four lung cancer patients. Positional prediction accuracy is compared with actual position in terms of the overall RMS errors. Various system delays, ranging from 33 to 1000 ms, were tested. Previous studies have shown duty cycles for latencies of 33 and 200 ms at around 90% and 80%, respectively, for linear, no prediction, Kalman filter and ANN methods as averaged over multiple patients. At 1000 ms, the previously reported duty cycles range from approximately 62% (ANN) down to 34% (no prediction). Average duty cycle for the HMM method was found to be 100% and 91 ± 3% for 33 and 200 ms latency and around 40% for 1000 ms latency in three out of four breathing motion traces. RMS errors were found to be lower than linear and no prediction methods at latencies of 1000 ms. The results show that for system latencies longer than 400 ms, the time average HMM prediction outperforms linear, no prediction, and the more general HMM-type predictive models. RMS errors for the time average model approach the theoretical limit of the HMM, and predicted state sequences are well correlated with sequences known to fit the data.
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.
Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue
2018-05-15
Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.
Methyl-CpG island-associated genome signature tags
Dunn, John J
2014-05-20
Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.
Cell type discovery using single-cell transcriptomics: implications for ontological representation.
Aevermann, Brian D; Novotny, Mark; Bakken, Trygve; Miller, Jeremy A; Diehl, Alexander D; Osumi-Sutherland, David; Lasken, Roger S; Lein, Ed S; Scheuermann, Richard H
2018-05-01
Cells are fundamental function units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single-cell transcriptional profiling using RNA sequencing is producing 'big data', enabling the identification of novel human cell types at an unprecedented rate. In this review, we summarize recent work characterizing cell types in the human central nervous and immune systems using single-cell and single-nuclei RNA sequencing, and discuss the implications that these discoveries are having on the representation of cell types in the reference Cell Ontology (CL). We propose a method, based on random forest machine learning, for identifying sets of necessary and sufficient marker genes, which can be used to assemble consistent and reproducible cell type definitions for incorporation into the CL. The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes being identified by high-throughput/high-content technologies findable, accessible, interoperable and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.
Comparing K-mer based methods for improved classification of 16S sequences.
Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars
2015-07-01
The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
Forsythe, Stephen J; Dickins, Benjamin; Jolley, Keith A
2014-12-16
Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains. The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes 'on the fly', and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS.
Mitochondrial DNA variant at HVI region as a candidate of genetic markers of type 2 diabetes
NASA Astrophysics Data System (ADS)
Gumilar, Gun Gun; Purnamasari, Yunita; Setiadi, Rahmat
2016-02-01
Mitochondrial DNA (mtDNA) is maternally inherited. mtDNA mutations which can contribute to the excess of maternal inheritance of type 2 diabetes. Due to the high mutation rate, one of the areas in the mtDNA that is often associated with the disease is the hypervariable region I (HVI). Therefore, this study was conducted to determine the genetic variants of human mtDNA HVI that related to the type 2 diabetes in four samples that were taken from four generations in one lineage. Steps being taken include the lyses of hair follicles, amplification of mtDNA HVI fragment using Polymerase Chain Reaction (PCR), detection of PCR products through agarose gel electrophoresis technique, the measurement of the concentration of mtDNA using UV-Vis spectrophotometer, determination of the nucleotide sequence via direct sequencing method and analysis of the sequencing results using SeqMan DNASTAR program. Based on the comparison between nucleotide sequence of samples and revised Cambridge Reference Sequence (rCRS) obtained six same mutations that these are C16147T, T16189C, C16193del, T16127C, A16235G, and A16293C. After comparing the data obtained to the secondary data from Mitomap and NCBI, it were found that two mutations, T16189C and T16217C, become candidates as genetic markers of type 2 diabetes even the mutations were found also in the generations of undiagnosed type 2 diabetes. The results of this study are expected to give contribution to the collection of human mtDNA database of genetic variants that associated to metabolic diseases, so that in the future it can be utilized in various fields, especially in medicine.
Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing
Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther
2015-01-01
Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256
Radhakrishna, Auji; Dwivedi, Krishna Kumar; Srivastava, Manoj Kumar; Roy, A K; Malaviya, D R; Kaushal, P
2018-06-01
Guinea grass ( Panicum maximum Jacq), an important fodder crop of humid and sub-humid tropical regions, reproduces through apomixis, a method of clonal propagation through seeds. Lack of knowledge of the genetic and molecular control of this phenomena has hindered the genetic improvement of this crop. The dataset provided here represents the first RNA-Seq based assembly and analysis of florets at pre-meiotic stage from the apomictic and sexual genotypes of guinea grass. The raw sequence files in FASTQ format were deposited in the NCBI SRA database with accession number SRP115883. A total of 24.8 Gb raw sequence data, corresponding to 17,96,65,827 raw reads was obtained by paired end sequencing. We used Trinity for de-novo assembly and identified 57,647 transcripts in sexual and 49,093 transcripts in apomictic type. This transcriptome data will be useful for identification and comparative analysis of genes regulating the mode of reproduction in grasses.
Prediction of Ras-effector interactions using position energy matrices.
Kiel, Christina; Serrano, Luis
2007-09-01
One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.
Creation of a type IIS restriction endonuclease with a long recognition sequence
Lippow, Shaun M.; Aha, Patti M.; Parker, Matthew H.; Blake, William J.; Baynes, Brian M.; Lipovšek, Daša
2009-01-01
Type IIS restriction endonucleases cleave DNA outside their recognition sequences, and are therefore particularly useful in the assembly of DNA from smaller fragments. A limitation of type IIS restriction endonucleases in assembly of long DNA sequences is the relative abundance of their target sites. To facilitate ligation-based assembly of extremely long pieces of DNA, we have engineered a new type IIS restriction endonuclease that combines the specificity of the homing endonuclease I-SceI with the type IIS cleavage pattern of FokI. We linked a non-cleaving mutant of I-SceI, which conveys to the chimeric enzyme its specificity for an 18-bp DNA sequence, to the catalytic domain of FokI, which cuts DNA at a defined site outside the target site. Whereas previously described chimeric endonucleases do not produce type IIS-like precise DNA overhangs suitable for ligation, our chimeric endonuclease cleaves double-stranded DNA exactly 2 and 6 nt from the target site to generate homogeneous, 5′, four-base overhangs, which can be ligated with 90% fidelity. We anticipate that these enzymes will be particularly useful in manipulation of DNA fragments larger than a thousand bases, which are very likely to contain target sites for all natural type IIS restriction endonucleases. PMID:19304757
Liu, Chang; Duffy, Brian; Bednarski, Jeffrey J; Calhoun, Cecelia; Lay, Lindsay; Rundblad, Barrett; Payton, Jacqueline E; Mohanakumar, Thalachallour
2016-02-01
To report the laboratory investigation of a case of severe combined immunodeficiency (SCID) with maternal T-cell engraftment, focusing on the interference of human leukocyte antigen (HLA) typing by blood chimerism. HLA typing was performed with three different methods, including sequence-specific primer (SSP), sequence-specific oligonucleotide, and Sanger sequencing on peripheral blood leukocytes and buccal cells, from a 3-month-old boy and peripheral blood leukocytes from his parents. Short tandem repeat (STR) testing was performed in parallel. HLA typing of the patient's peripheral blood leukocytes using the SSP method demonstrated three different alleles for each of the HLA-B and HLA-C loci, with both maternal alleles present at each locus. Typing results from the patient's buccal cells showed a normal pattern of inheritance for paternal and maternal haplotypes. STR enrichment testing of the patient's CD3+ T lymphocytes and CD15+ myeloid cells confirmed maternal T-cell engraftment, while the myeloid cell profile matched the patient's buccal cells. Maternal T-cell engraftment may interfere with HLA typing in patients with SCID. Selection of the appropriate typing methods and specimens is critical for accurate HLA typing and immunologic assessment before allogeneic hematopoietic stem cell transplantation. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Pathogen profiling for disease management and surveillance.
Sintchenko, Vitali; Iredell, Jonathan R; Gilbert, Gwendolyn L
2007-06-01
The usefulness of rapid pathogen genotyping is widely recognized, but its effective interpretation and application requires integration into clinical and public health decision-making. How can pathogen genotyping data best be translated to inform disease management and surveillance? Pathogen profiling integrates microbial genomics data into communicable disease control by consolidating phenotypic identity-based methods with DNA microarrays, proteomics, metabolomics and sequence-based typing. Sharing data on pathogen profiles should facilitate our understanding of transmission patterns and the dynamics of epidemics.
Li, Jun; Tibshirani, Robert
2015-01-01
We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or ‘sequencing depths’. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by ‘outliers’ in the data. We introduce a simple, nonparametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods. PMID:22127579
Whole genome sequencing in the prevention and control of Staphylococcus aureus infection.
Price, J R; Didelot, X; Crook, D W; Llewelyn, M J; Paul, J
2013-01-01
Staphylococcus aureus remains a leading cause of hospital-acquired infection but weaknesses inherent in currently available typing methods impede effective infection prevention and control. The high resolution offered by whole genome sequencing has the potential to revolutionise our understanding and management of S. aureus infection. To outline the practicalities of whole genome sequencing and discuss how it might shape future infection control practice. We review conventional typing methods and compare these with the potential offered by whole genome sequencing. In contrast with conventional methods, whole genome sequencing discriminates down to single nucleotide differences and allows accurate characterisation of transmission events and outbreaks and additionally provides information about the genetic basis of phenotypic characteristics, including antibiotic susceptibility and virulence. However, translating its potential into routine practice will depend on affordability, acceptable turnaround times and on creating a reliable standardised bioinformatic infrastructure. Whole genome sequencing has the potential to provide a universal test that facilitates outbreak investigation, enables the detection of emerging strains and predicts their clinical importance. Copyright © 2012 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
Pollen, Alex A; Nowakowski, Tomasz J; Shuga, Joe; Wang, Xiaohui; Leyrat, Anne A; Lui, Jan H; Li, Nianzhen; Szpankowski, Lukasz; Fowler, Brian; Chen, Peilin; Ramalingam, Naveen; Sun, Gang; Thu, Myo; Norris, Michael; Lebofsky, Ronald; Toppani, Dominique; Kemp, Darnell W; Wong, Michael; Clerkson, Barry; Jones, Brittnee N; Wu, Shiquan; Knutsson, Lawrence; Alvarado, Beatriz; Wang, Jing; Weaver, Lesley S; May, Andrew P; Jones, Robert C; Unger, Marc A; Kriegstein, Arnold R; West, Jay A A
2014-10-01
Large-scale surveys of single-cell gene expression have the potential to reveal rare cell populations and lineage relationships but require efficient methods for cell capture and mRNA sequencing. Although cellular barcoding strategies allow parallel sequencing of single cells at ultra-low depths, the limitations of shallow sequencing have not been investigated directly. By capturing 301 single cells from 11 populations using microfluidics and analyzing single-cell transcriptomes across downsampled sequencing depths, we demonstrate that shallow single-cell mRNA sequencing (~50,000 reads per cell) is sufficient for unbiased cell-type classification and biomarker identification. In the developing cortex, we identify diverse cell types, including multiple progenitor and neuronal subtypes, and we identify EGR1 and FOS as previously unreported candidate targets of Notch signaling in human but not mouse radial glia. Our strategy establishes an efficient method for unbiased analysis and comparison of cell populations from heterogeneous tissue by microfluidic single-cell capture and low-coverage sequencing of many cells.
Using comparative genome analysis to identify problems in annotated microbial genomes.
Poptsova, Maria S; Gogarten, J Peter
2010-07-01
Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.
Cheng, Chia-Ying; Tsai, Chia-Feng; Chen, Yu-Ju; Sung, Ting-Yi; Hsu, Wen-Lian
2013-05-03
As spectral library searching has received increasing attention for peptide identification, constructing good decoy spectra from the target spectra is the key to correctly estimating the false discovery rate in searching against the concatenated target-decoy spectral library. Several methods have been proposed to construct decoy spectral libraries. Most of them construct decoy peptide sequences and then generate theoretical spectra accordingly. In this paper, we propose a method, called precursor-swap, which directly constructs decoy spectral libraries directly at the "spectrum level" without generating decoy peptide sequences by swapping the precursors of two spectra selected according to a very simple rule. Our spectrum-based method does not require additional efforts to deal with ion types (e.g., a, b or c ions), fragment mechanism (e.g., CID, or ETD), or unannotated peaks, but preserves many spectral properties. The precursor-swap method is evaluated on different spectral libraries and the results of obtained decoy ratios show that it is comparable to other methods. Notably, it is efficient in time and memory usage for constructing decoy libraries. A software tool called Precursor-Swap-Decoy-Generation (PSDG) is publicly available for download at http://ms.iis.sinica.edu.tw/PSDG/.
Antonov, V A; Altukhova, V V; Savchenko, S S; Zamaraev, V S; Iliukhin, V I; Alekseev, V V
2007-01-01
Burkholderia mallei is highly pathogenic microorganism for both humans and animals. In this work, the possibility of the use of the genotyping method for differentiation between strains of B. mallei was studied. A collection of 14 isolates of B. mallei was characterized using randomly amplified polymorphic DNA (RAPD) and multilocus sequence typing (MLST). RAPD was the best method used for detecting strain differences of B. mallei. It was suggested that this method would be an increasingly useful molecular epidemiological tool.
Improvements in Block-Krylov Ritz Vectors and the Boundary Flexibility Method of Component Synthesis
NASA Technical Reports Server (NTRS)
Carney, Kelly Scott
1997-01-01
A method of dynamic substructuring is presented which utilizes a set of static Ritz vectors as a replacement for normal eigenvectors in component mode synthesis. This set of Ritz vectors is generated in a recurrence relationship, proposed by Wilson, which has the form of a block-Krylov subspace. The initial seed to the recurrence algorithm is based upon the boundary flexibility vectors of the component. Improvements have been made in the formulation of the initial seed to the Krylov sequence, through the use of block-filtering. A method to shift the Krylov sequence to create Ritz vectors that will represent the dynamic behavior of the component at target frequencies, the target frequency being determined by the applied forcing functions, has been developed. A method to terminate the Krylov sequence has also been developed. Various orthonormalization schemes have been developed and evaluated, including the Cholesky/QR method. Several auxiliary theorems and proofs which illustrate issues in component mode synthesis and loss of orthogonality in the Krylov sequence have also been presented. The resulting methodology is applicable to both fixed and free- interface boundary components, and results in a general component model appropriate for any type of dynamic analysis. The accuracy is found to be comparable to that of component synthesis based upon normal modes, using fewer generalized coordinates. In addition, the block-Krylov recurrence algorithm is a series of static solutions and so requires significantly less computation than solving the normal eigenspace problem. The requirement for less vectors to form the component, coupled with the lower computational expense of calculating these Ritz vectors, combine to create a method more efficient than traditional component mode synthesis.
Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N
2016-04-02
Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.
Nguyen, Bach Hoang; Phan, Dieu Hong Nu; Nguyen, Hien Xuan; Le, An Van; Alberti, Alberto
2015-07-04
Streptococcus suis (S. suis) serotype 2 has recently become the most prevalent cause of meningitis in adults in many areas of Vietnam. This study provides data on S. suis molecular diagnosis in central Vietnam using a real-time polymerase chain reaction (PCR) assay targeting the S. suis serotype 2 cps2J gene. Additionally, 16S-23S rDNA intragenic spacer (ITS)-based phylogenic analysis of strains isolated from cerebrospinal fluid (CSF) in Thua Thien Hue Province, Vietnam, is presented and discussed. Pathogenic bacteria were isolated from 40 CSF samples, and 18 were identified as S. suis by culture-dependent methods. Capsular serotyping was assessed by real-time PCR. ITS sequences were obtained after traditional PCR and were used in phylogenic analyses. Pathogenic bacteria were isolated from 36 out of 40 CSF samples. A total of 18 S. suis strains were isolated and assigned to serotype 2 by real-time PCR. One CSF sample, negative when tested by culture-dependent methods, was positive to S. suis serotype 2 by real-time PCR. Pairwise alignments of the 18 ITS sequences did not reveal any variable nucleotide position, and resulted in a single sequence type. Sequences were similar to S. suis serotype 2 reference ITS sequences (> 98.1%), and there was no lack of an ITS spacer region in the isolates. S. suis serotype 2 is the most prevalent serotype in central Vietnam. Real-time PCR assay proved to be a reliable diagnostic method for early detection of S. suis 2 in CSF samples.
Porter, Teresita M.; Golding, G. Brian
2012-01-01
Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215
Development of a genotyping microarray for Usher syndrome.
Cremers, Frans P M; Kimberling, William J; Külm, Maigi; de Brouwer, Arjan P; van Wijk, Erwin; te Brinke, Heleen; Cremers, Cor W R J; Hoefsloot, Lies H; Banfi, Sandro; Simonelli, Francesca; Fleischhauer, Johannes C; Berger, Wolfgang; Kelley, Phil M; Haralambous, Elene; Bitner-Glindzicz, Maria; Webster, Andrew R; Saihan, Zubin; De Baere, Elfride; Leroy, Bart P; Silvestri, Giuliana; McKay, Gareth J; Koenekoop, Robert K; Millan, Jose M; Rosenberg, Thomas; Joensuu, Tarja; Sankila, Eeva-Marja; Weil, Dominique; Weston, Mike D; Wissinger, Bernd; Kremer, Hannie
2007-02-01
Usher syndrome, a combination of retinitis pigmentosa (RP) and sensorineural hearing loss with or without vestibular dysfunction, displays a high degree of clinical and genetic heterogeneity. Three clinical subtypes can be distinguished, based on the age of onset and severity of the hearing impairment, and the presence or absence of vestibular abnormalities. Thus far, eight genes have been implicated in the syndrome, together comprising 347 protein-coding exons. To improve DNA diagnostics for patients with Usher syndrome, we developed a genotyping microarray based on the arrayed primer extension (APEX) method. Allele-specific oligonucleotides corresponding to all 298 Usher syndrome-associated sequence variants known to date, 76 of which are novel, were arrayed. Approximately half of these variants were validated using original patient DNAs, which yielded an accuracy of >98%. The efficiency of the Usher genotyping microarray was tested using DNAs from 370 unrelated European and American patients with Usher syndrome. Sequence variants were identified in 64/140 (46%) patients with Usher syndrome type I, 45/189 (24%) patients with Usher syndrome type II, 6/21 (29%) patients with Usher syndrome type III and 6/20 (30%) patients with atypical Usher syndrome. The chip also identified two novel sequence variants, c.400C>T (p.R134X) in PCDH15 and c.1606T>C (p.C536S) in USH2A. The Usher genotyping microarray is a versatile and affordable screening tool for Usher syndrome. Its efficiency will improve with the addition of novel sequence variants with minimal extra costs, making it a very useful first-pass screening tool.
Wise, Mark G.; McArthur, J Vaun; Shimkets, Lawrence J.
1999-01-01
The diversity of the methanotrophic community in mildly acidic landfill cover soil was assessed by three methods: two culture-independent molecular approaches and a traditional culture-based approach. For the first of the molecular studies, two primer pairs specific for the 16S rRNA gene of validly published type I (including the former type X) and type II methanotrophs were identified and tested. These primers were used to amplify directly extracted soil DNA, and the products were used to construct type I and type II clone libraries. The second molecular approach, based on denaturing gradient gel electrophoresis (DGGE), provided profiles of the methanotrophic community members as distinguished by sequence differences in variable region 3 of the 16S ribosomal DNA. For the culturing studies, an extinction-dilution technique was employed to isolate slow-growing but numerically dominant strains. The key variables of the series of enrichment conditions were initial pH (4.8 versus 6.8), air/CH4/CO2 headspace ratio (50:45:5 versus 90:9:1), and concentration of the medium (1× nitrate minimal salts [NMS] versus 0.2× NMS). Screening of the isolates showed that the nutrient-rich 1× NMS selected for type I methanotrophs, while the nutrient-poor 0.2× NMS tended to enrich for type II methanotrophs. Partial sequencing of the 16S rRNA gene from selected clones and isolates revealed some of the same novel sequence types. Phylogenetic analysis of the type I clone library suggested the presence of a new phylotype related to the Methylobacter-Methylomicrobium group, and this was confirmed by isolating two members of this cluster. The type II clone library also suggested the existence of a novel group of related species distinct from the validly published Methylosinus and Methylocystis genera, and two members of this cluster were also successfully cultured. Partial sequencing of the pmoA gene, which codes for the 27-kDa polypeptide of the particulate methane monooxygenase, reaffirmed the phylogenetic placement of the four isolates. Finally, not all of the bands separated by DGGE could be accounted for by the clones and isolates. This polyphasic assessment of community structure demonstrates that much diversity among the obligate methane oxidizers has yet to be formally described. PMID:10543800
Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming
2012-07-01
We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
Mapping Ribonucleotides Incorporated into DNA by Hydrolytic End-Sequencing.
Orebaugh, Clinton D; Lujan, Scott A; Burkholder, Adam B; Clausen, Anders R; Kunkel, Thomas A
2018-01-01
Ribonucleotides embedded within DNA render the DNA sensitive to the formation of single-stranded breaks under alkali conditions. Here, we describe a next-generation sequencing method called hydrolytic end sequencing (HydEn-seq) to map ribonucleotides inserted into the genome of Saccharomyce cerevisiae strains deficient in ribonucleotide excision repair. We use this method to map several genomic features in wild-type and replicase variant yeast strains.
DNA Sequence-Dependent Ionic Currents in Ultra-Small Solid-State Nanopores†
Comer, Jeffrey
2016-01-01
Measurements of ionic currents through nanopores partially blocked by DNA have emerged as a powerful method for characterization of the DNA nucleotide sequence. Although the effect of the nucleotide sequence on the nanopore blockade current has been experimentally demonstrated, prediction and interpretation of such measurements remain a formidable challenge. Using atomic resolution computational approaches, here we show how the sequence, molecular conformation, and pore geometry affect the blockade ionic current in model solid-state nanopores. We demonstrate that the blockade current from a DNA molecule is determined by the chemical identities and conformations of at least three consecutive nucleotides. We find the blockade currents produced by the nucleotide triplets to vary considerably with their nucleotide sequence despite having nearly identical molecular conformations. Encouragingly, we find blockade current differences as large as 25% for single-base substitutions in ultra small (1.6 nm × 1.1 nm cross section; 2 nm length) solid-state nanopores. Despite the complex dependence of the blockade current on the sequence and conformation of the DNA triplets, we find that, under many conditions, the number of thymine bases is positively correlated with the current, whereas the number of purine bases and the presence of both purine and pyrimidines in the triplet are negatively correlated with the current. Based on these observations, we construct a simple theoretical model that relates the ion current to the base content of a solid-state nanopore. Furthermore, we show that compact conformations of DNA in narrow pores provide the greatest signal-to-noise ratio for single base detection, whereas reduction of the nanopore length increases the ionic current noise. Thus, the sequence dependence of nanopore blockade current can be theoretically rationalized, although the predictions will likely need to be customized for each nanopore type. PMID:27103233
Wilkes, Rebecca P; Sanchez, Elena; Riley, Matthew C; Kennedy, Melissa A
2014-01-01
Canine distemper virus (CDV) remains a common cause of infectious disease in dogs, particularly in high-density housing situations such as shelters. Vaccination of all dogs against CDV is recommended at the time of admission to animal shelters and many use a modified live virus (MLV) vaccine. From a diagnostic standpoint for dogs with suspected CDV infection, this is problematic because highly sensitive diagnostic real-time reverse transcription polymerase chain reaction (RT-PCR) tests are able to detect MLV virus in clinical samples. Real-time PCR can be used to quantitate amount of virus shedding and can differentiate vaccine strains from wild-type strains when shedding is high. However, differentiation by quantitation is not possible in vaccinated animals during acute infection, when shedding is low and could be mistaken for low level vaccine virus shedding. While there are gel-based RT-PCR assays for differentiation of vaccine strains from field strains based on sequence differences, the sensitivity of these assays is unable to match that of the real-time RT-PCR assay currently used in the authors' laboratory. Therefore, a real-time RT-PCR assay was developed that detects CDV MLV vaccine strains and distinguishes them from wild-type strains based on nucleotide sequence differences, rather than the amount of viral RNA in the sample. The test is highly sensitive, with detection of as few as 5 virus genomic copies (corresponding to 10(-1) TCID(50)). Sequencing of the DNA real-time products also allows phylogenetic differentiation of the wild-type strains. This test will aid diagnosis during outbreaks of CDV in recently vaccinated animals.
Vela, Ana I; Casas-Díaz, Encarna; Lavín, Santiago; Domínguez, Lucas; Fernández-Garayzábal, Jose F
2015-09-01
Four isolates of an unknown Gram-stain-positive, catalase-negative coccus-shaped organism, isolated from the pharynx of four wild rabbits, were characterized by phenotypic and molecular genetic methods. The micro-organisms were tentatively assigned to the genus Streptococcus based on cellular morphological and biochemical criteria, although the organisms did not appear to correspond to any species with a validly published name. Comparative 16S rRNA gene sequencing confirmed their identification as members of the genus Streptococcus, being most closely related phylogenetically to Streptococcus porcorum 682-03(T) (96.9% 16S rRNA gene sequence similarity). Analysis of rpoB and sodA gene sequences showed divergence values between the novel species and S. porcorum 682-03(T) (the closest phylogenetic relative determined from 16S rRNA gene sequences) of 18.1 and 23.9%, respectively. The novel bacterial isolate could be distinguished from the type strain of S. porcorum by several biochemical characteristics, such as the production of glycyl-tryptophan arylamidase and α-chymotrypsin, and the non-acidification of different sugars. Based on both phenotypic and phylogenetic findings, it is proposed that the unknown bacterium be assigned to a novel species of the genus Streptococcus, and named Streptococcus pharyngis sp. nov. The type strain is DICM10-00796B(T) ( = CECT 8754(T) = CCUG 66496(T)).
ERIC Educational Resources Information Center
Ipek, Ismail
2010-01-01
The purpose of this study was to investigate the effects of CBI lesson sequence type and cognitive style of field dependence on learning from Computer-Based Cooperative Instruction (CBCI) in WEB on the dependent measures, achievement, reading comprehension and reading rate. Eighty-seven college undergraduate students were randomly assigned to…
Demidov, German; Simakova, Tamara; Vnuchkova, Julia; Bragin, Anton
2016-10-22
Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool. We have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm's sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance. We showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.
2013-01-01
Background The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations – changes specific to a tumor and not within an individual’s germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. Results We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. Conclusion We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic. PMID:23642077
Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W
2013-05-04
The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.
Amino acid selective unlabeling for sequence specific resonance assignments in proteins
Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha
2010-01-01
Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044
Lingner, Thomas; Kataya, Amr R. A.; Reumann, Sigrun
2012-01-01
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences.1 As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity.” Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals. PMID:22415050
Lingner, Thomas; Kataya, Amr R A; Reumann, Sigrun
2012-02-01
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.
DNA methylation assessment from human slow- and fast-twitch skeletal muscle fibers
Begue, Gwénaëlle; Raue, Ulrika; Jemiolo, Bozena
2017-01-01
A new application of the reduced representation bisulfite sequencing method was developed using low-DNA input to investigate the epigenetic profile of human slow- and fast-twitch skeletal muscle fibers. Successful library construction was completed with as little as 15 ng of DNA, and high-quality sequencing data were obtained with 32 ng of DNA. Analysis identified 143,160 differentially methylated CpG sites across 14,046 genes. In both fiber types, selected genes predominantly expressed in slow or fast fibers were hypomethylated, which was supported by the RNA-sequencing analysis. These are the first fiber type-specific methylation data from human skeletal muscle and provide a unique platform for future research. NEW & NOTEWORTHY This study validates a low-DNA input reduced representation bisulfite sequencing method for human muscle biopsy samples to investigate the methylation patterns at a fiber type-specific level. These are the first fiber type-specific methylation data reported from human skeletal muscle and thus provide initial insight into basal state differences in myosin heavy chain I and IIa muscle fibers among young, healthy men. PMID:28057818
2013-01-01
Background Human leukocyte antigen matching at allelic resolution is proven clinically significant in hematopoietic stem cell transplantation, lowering the risk of graft-versus-host disease and mortality. However, due to the ever growing HLA allele database, tissue typing laboratories face substantial challenges. In light of the complexity and the high degree of allelic diversity, it has become increasingly difficult to define the classical transplantation antigens at high-resolution by using well-tried methods. Thus, next-generation sequencing is entering into diagnostic laboratories at the perfect time and serving as a promising tool to overcome intrinsic HLA typing problems. Therefore, we have developed and validated a scalable automated HLA class I and class II typing approach suitable for diagnostic use. Results A validation panel of 173 clinical and proficiency testing samples was analysed, demonstrating 100% concordance to the reference method. From a total of 1,273 loci we were able to generate 1,241 (97.3%) initial successful typings. The mean ambiguity reduction for the analysed loci was 93.5%. Allele assignment including intronic sequences showed an improved resolution (99.2%) of non-expressed HLA alleles. Conclusion We provide a powerful HLA typing protocol offering a short turnaround time of only two days, a fully integrated workflow and most importantly a high degree of typing reliability. The presented automated assay is flexible and can be scaled by specific primer compilations and the use of different 454 sequencing systems. The workflow was successfully validated according to the policies of the European Federation for Immunogenetics. Next-generation sequencing seems to become one of the new methods in the field of Histocompatibility. PMID:23557197
Intact long-type dupA as a marker for gastroduodenal diseases in Okinawan subpopulation, Japan.
Takahashi, Ayaka; Shiota, Seiji; Matsunari, Osamu; Watada, Masahide; Suzuki, Rumiko; Nakachi, Saori; Kinjo, Nagisa; Kinjo, Fukunori; Yamaoka, Yoshio
2013-02-01
Helicobacter pylori dupA can be divided into two types according to the presence or absence of the mutation. In addition, full-sequenced data revealed that dupA has two types with different lengths depend on the presence of approximately 600 bp in the putative 5' region (presence; long-type and absence; short-type), which has not been taken into account in previous studies. A total of 319 strains isolated from Okinawa, the south islands of Japan, were included. The status of dupA and cagA was determined by polymerase chain reaction. The presence of mutations in long-type dupA was determined by DNA sequencing. The prevalence of long-type dupA was 26.3% (84/319). Sequence analysis showed that there were only six cases (7.1%) with point mutations lead to stop codon among 84 long-type dupA strains studied. Interestingly, intact long-type dupA without frameshift mutation, but not short-type dupA, was significantly associated with gastric ulcer and gastric cancer than gastritis (p = .001 and p = .019, respectively). After adjustment by age, gender, and cagA, the presence of intact long-type dupA was significantly associated with gastric ulcer and gastric cancer compared with gastritis (odds ratio [OR] = 3.35, 95% confidence interval [CI] = 1.55-7.24 and OR = 4.14, 95% CI = 1.23-13.94, respectively). Intact long-type dupA is a real virulence marker for severe outcomes in Okinawa, Japan. The previous information gained from PCR-based methods without taking long-type dupA into account must be interpreted with caution. © 2012 Blackwell Publishing Ltd.
Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations
Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.
2017-01-01
Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A→G mutations. We show that major effects of neighbors on germline mutation lie within ±2 of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T→C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. PMID:27974498
Chen, Hui; Luthra, Rajyalakshmi; Goswami, Rashmi S; Singh, Rajesh R; Roy-Chowdhuri, Sinchita
2015-08-28
Application of next-generation sequencing (NGS) technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM) (Life Technologies), a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
Nath, B Surendra; Gupta, S K; Bajpai, A K
2012-12-01
The life cycle, spore morphology, pathogenicity, tissue specificity, mode of transmission and small subunit rRNA (SSU-rRNA) gene sequence analysis of the five new microsporidian isolates viz., NIWB-11bp, NIWB-12n, NIWB-13md, NIWB-14b and NIWB-15mb identified from the silkworm, Bombyx mori have been studied along with type species, NIK-1s_mys. The life cycle of the microsporidians identified exhibited the sequential developmental cycles that are similar to the general developmental cycle of the genus, Nosema. The spores showed considerable variations in their shape, length and width. The pathogenicity observed was dose-dependent and differed from each of the microsporidian isolates; the NIWB-15mb was found to be more virulent than other isolates. All of the microsporidians were found to infect most of the tissues examined and showed gonadal infection and transovarial transmission in the infected silkworms. SSU-rRNA sequence based phylogenetic tree placed NIWB-14b, NIWB-12n and NIWB-11bp in a separate branch along with other Nosema species and Nosema bombycis; while NIWB-15mb and NIWB-13md together formed another cluster along with other Nosema species. NIK-1s_mys revealed a signature sequence similar to standard type species, N. bombycis, indicating that NIK-1s_mys is similar to N. bombycis. Based on phylogenetic relationships, branch length information based on genetic distance and nucleotide differences, we conclude that the microsporidian isolates identified are distinctly different from the other known species and belonging to the genus, Nosema. This SSU-rRNA gene sequence analysis method is found to be more useful approach in detecting different and closely related microsporidians of this economically important domestic insect.
Pandey, Ravi S; Azad, Rajeev K
2016-03-01
Sex chromosomes have evolved from a pair of homologous autosomes which differentiated into sex determination systems, such as XY or ZW system, as a consequence of successive recombination suppression between the gametologous chromosomes. Identifying the regions of recombination suppression, namely, the "evolutionary strata", is central to understanding the history and dynamics of sex chromosome evolution. Evolution of sex chromosomes as a consequence of serial recombination suppressions is well-studied for mammals and birds, but not for plants, although 48 dioecious plants have already been reported. Only two plants Silene latifolia and papaya have been studied until now for the presence of evolutionary strata on their X chromosomes, made possible by the sequencing of sex-linked genes on both the X and Y chromosomes, which is a requirement of all current methods that determine stratum structure based on the comparison of gametologous sex chromosomes. To circumvent this limitation and detect strata even if only the sequence of sex chromosome in the homogametic sex (i.e. X or Z chromosome) is available, we have developed an integrated segmentation and clustering method. In application to gene sequences on the papaya X chromosome and protein-coding sequences on the S. latifolia X chromosome, our method could decipher all known evolutionary strata, as reported by previous studies. Our method, after validating on known strata on the papaya and S. latifolia X chromosome, was applied to the chromosome 19 of Populus trichocarpa, an incipient sex chromosome, deciphering two, yet unknown, evolutionary strata. In addition, we applied this approach to the recently sequenced sex chromosome V of the brown alga Ectocarpus sp. that has a haploid sex determination system (UV system) recovering the sex determining and pseudoautosomal regions, and then to the mating-type chromosomes of an anther-smut fungus Microbotryum lychnidis-dioicae predicting five strata in the non-recombining region of both the chromosomes.
CRISPR Typing and Subtyping for Improved Laboratory Surveillance of Salmonella Infections
Fabre, Laëtitia; Zhang, Jian; Guigon, Ghislaine; Le Hello, Simon; Guibert, Véronique; Accou-Demartin, Marie; de Romans, Saïana; Lim, Catherine; Roux, Chrystelle; Passet, Virginie; Diancourt, Laure; Guibourdenche, Martine; Issenhuth-Jeanjean, Sylvie; Achtman, Mark; Brisse, Sylvain; Sola, Christophe; Weill, François-Xavier
2012-01-01
Laboratory surveillance systems for salmonellosis should ideally be based on the rapid serotyping and subtyping of isolates. However, current typing methods are limited in both speed and precision. Using 783 strains and isolates belonging to 130 serotypes, we show here that a new family of DNA repeats named CRISPR (clustered regularly interspaced short palindromic repeats) is highly polymorphic in Salmonella. We found that CRISPR polymorphism was strongly correlated with both serotype and multilocus sequence type. Furthermore, spacer microevolution discriminated between subtypes within prevalent serotypes, making it possible to carry out typing and subtyping in a single step. We developed a high-throughput subtyping assay for the most prevalent serotype, Typhimurium. An open web-accessible database was set up, providing a serotype/spacer dictionary and an international tool for strain tracking based on this innovative, powerful typing and subtyping tool. PMID:22623967
VarBin, a novel method for classifying true and false positive variants in NGS data
2013-01-01
Background Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. Methods VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). Results To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. Conclusions These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2. PMID:24266885
NASA Astrophysics Data System (ADS)
Boyajian, Tabetha S.; von Braun, Kaspar; van Belle, Gerard; Farrington, Chris; Schaefer, Gail; Jones, Jeremy; White, Russel; McAlister, Harold A.; ten Brummelaar, Theo A.; Ridgway, Stephen; Gies, Douglas; Sturmann, Laszlo; Sturmann, Judit; Turner, Nils H.; Goldfinger, P. J.; Vargas, Norm
2013-07-01
Based on CHARA Array measurements, we present the angular diameters of 23 nearby, main-sequence stars, ranging from spectral types A7 to K0, 5 of which are exoplanet host stars. We derive linear radii, effective temperatures, and absolute luminosities of the stars using Hipparcos parallaxes and measured bolometric fluxes. The new data are combined with previously published values to create an Angular Diameter Anthology of measured angular diameters to main-sequence stars (luminosity classes V and IV). This compilation consists of 125 stars with diameter uncertainties of less than 5%, ranging in spectral types from A to M. The large quantity of empirical data is used to derive color-temperature relations to an assortment of color indices in the Johnson (BVR J I J JHK), Cousins (R C I C), Kron (R K I K), Sloan (griz), and WISE (W 3 W 4) photometric systems. These relations have an average standard deviation of ~3% and are valid for stars with spectral types A0-M4. To derive even more accurate relations for Sun-like stars, we also determined these temperature relations omitting early-type stars (T eff > 6750 K) that may have biased luminosity estimates because of rapid rotation; for this subset the dispersion is only ~2.5%. We find effective temperatures in agreement within a couple of percent for the interferometrically characterized sample of main-sequence stars compared to those derived via the infrared flux method and spectroscopic analysis.
AMS 4.0: consensus prediction of post-translational modifications in protein sequences.
Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit
2012-08-01
We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oohama, N.; Okamura, S.; Fukugita, M.
A bulge-disk decomposition is made for 737 spiral and lenticular galaxies drawn from a Sloan Digital Sky Survey galaxy sample for which morphological types are estimated. We carry out the bulge-disk decomposition using the growth curve fitting method. It is found that bulge properties, effective radius, effective surface brightness, and also absolute magnitude, change systematically with the morphological sequence; from early to late types, the size becomes somewhat larger, and surface brightness and luminosity fainter. In contrast, disks are nearly universal, their properties remaining similar among disk galaxies irrespective of detailed morphologies from S0 to Sc. While these tendencies weremore » often discussed in previous studies, the present study confirms them based on a large homogeneous magnitude-limited field galaxy sample with morphological types estimated. The systematic change of bulge-to-total luminosity ratio, B/T, along the morphological sequence is therefore not caused by disks but mostly by bulges. It is also shown that elliptical galaxies and bulges of spiral galaxies are unlikely to be in a single sequence. We infer the stellar mass density (in units of the critical mass density) to be OMEGA = 0.0021 for spheroids, i.e., elliptical galaxies plus bulges of spiral galaxies, and OMEGA = 0.00081 for disks.« less
Precise genotyping and recombination detection of Enterovirus
2015-01-01
Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep conception, http://symbiont.iis.sinica.edu.tw/evidence), for EV genotyping and recombination detection. We introduce the idea of using mixed-ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. PMID:26678286
Local alignment of two-base encoded DNA sequence
Homer, Nils; Merriman, Barry; Nelson, Stanley F
2009-01-01
Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-05-01
Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-01-01
Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
Earth field NMR with chemical shift spectral resolution: theory and proof of concept.
Katz, Itai; Shtirberg, Lazar; Shakour, Gubrail; Blank, Aharon
2012-06-01
A new method for obtaining an NMR signal in the Earth's magnetic field (EF) is presented. The method makes use of a simple pulse sequence with only DC fields which is much less demanding than previous approaches in terms of the pulses' rise and fall times. Furthermore, it offers the possibility of obtaining NMR data with enough spectral resolution to allow retrieving high resolution molecular chemical shift (CS) information - a capability that was not considered possible in EF NMR until now. Details of the pulse sequence, the experimental system, and our specially tailored EF NMR probe are provided. The experimental results demonstrate the capability to differentiate between three types of samples made of common fluorine compounds, based on their CS data. Copyright © 2012 Elsevier Inc. All rights reserved.
Human action recognition based on spatial-temporal descriptors using key poses
NASA Astrophysics Data System (ADS)
Hu, Shuo; Chen, Yuxin; Wang, Huaibao; Zuo, Yaqing
2014-11-01
Human action recognition is an important area of pattern recognition today due to its direct application and need in various occasions like surveillance and virtual reality. In this paper, a simple and effective human action recognition method is presented based on the key poses of human silhouette and the spatio-temporal feature. Firstly, the contour points of human silhouette have been gotten, and the key poses are learned by means of K-means clustering based on the Euclidean distance between each contour point and the centre point of the human silhouette, and then the type of each action is labeled for further match. Secondly, we obtain the trajectories of centre point of each frame, and create a spatio-temporal feature value represented by W to describe the motion direction and speed of each action. The value W contains the information of location and temporal order of each point on the trajectories. Finally, the matching stage is performed by comparing the key poses and W between training sequences and test sequences, the nearest neighbor sequences is found and its label supplied the final result. Experiments on the public available Weizmann datasets show the proposed method can improve accuracy by distinguishing amphibious poses and increase suitability for real-time applications by reducing the computational cost.
Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc
2004-01-01
The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879
Qiu, Jian-Ding; Luo, San-Hua; Huang, Jian-Hua; Sun, Xing-Yu; Liang, Ru-Ping
2010-04-01
Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. As a result of genome and other sequencing projects, the gap between the number of known apoptosis protein sequences and the number of known apoptosis protein structures is widening rapidly. Because of this extremely unbalanced state, it would be worthwhile to develop a fast and reliable method to identify their subcellular locations so as to gain better insight into their biological functions. In view of this, a new method, in which the support vector machine combines with discrete wavelet transform, has been developed to predict the subcellular location of apoptosis proteins. The results obtained by the jackknife test were quite promising, and indicated that the proposed method can remarkably improve the prediction accuracy of subcellular locations, and might also become a useful high-throughput tool in characterizing other attributes of proteins, such as enzyme class, membrane protein type, and nuclear receptor subfamily according to their sequences.
SVM-dependent pairwise HMM: an application to protein pairwise alignments.
Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F
2017-12-15
Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
USDA-ARS?s Scientific Manuscript database
The PCR-based Escherichia coli O157 (O157) strain typing system, Polymorphic Amplified Typing Sequences (PATS), targets insertions-deletions (Indels) and single nucleotide polymorphisms (SNPs) at the XbaI and AvrII(BlnI) restriction enzyme sites, respectively, besides amplifying four known virulenc...
Tamburro, M; Ripabelli, G
2017-01-01
Rapid, reliable and accurate molecular typing methods are essential for outbreaks detection and infectious diseases control, for monitoring the evolution and dynamics of microbial populations, and for effective epidemiological surveillance. The introduction of a novel method based on the analysis of melting temperature of amplified products, known as High Resolution Melting (HRM) since 2002, has found applications in epidemiological studies, either for identification of bacterial species or molecular typing, as well as an extensive and increasing use in many research fields. HRM method is based on the use of saturating third generation dyes, advanced real-time PCR platforms, and bioinformatics tools. To describe, by a comphrehensive review of the literature, the use, application and usefulness of HRM for the genotyping of bacterial pathogens in the context of epidemiological surveillance and public health. A literature search was carried out during July-August 2016, by consulting the biomedical databases PubMed/Medline, Scopus, EMBASE, and ISI Web of Science without limits. The search strategy was performed according to the following keywords: high resolution melting analysis and bacteria and genotyping or molecular typing. All the articles evaluating the application of HRM for bacterial pathogen genotyping were selected and reviewed, taking into account the objective of each study, the rationale explaining the use of this technology, and the main results obtained in comparison with gold standards and/or alternative methods, when available. HRM method was extensively used for molecular typing of both Gram-positive and Gram-negative bacterial pathogens, representing a versatile genetic tool: a) to evaluate genetic diversity and subtype at species/subspecies level, based also on allele discrimination/identification and mutation screening; b) to recognize phylogenetic groupings (lineage, sublineage, subgroups); c) to identify antimicrobial resistance; d) to detect and screen for mutations related to drug-resistance; e) to discriminate gene isoforms. HRM method showed, in almost all instances, excellent typeability and discriminatory power, with high concordance of typing results obtained with gold standards or comparable methods. Conversely, for the evaluation of genetic determinants associated to antibiotic-resistance or for screening of associated mutations in key gene fragments, the sensitivity and specificity was not optimal, because the targeted amplicons did not encompass all the crucial mutations. Despite the recent introduction of sequencing-based methods, the HRM method deserves consideration in research fields of infectious diseases, being characterized by low cost, rapidity, flexibility and versatility. However, there are some limitations to HRM assays development, which should be carefully considered. The most common application of HRM for bacterial typing is related to Single Nucleotide Polymorphism (SNP)-based genotyping with the analysis of gene fragments within the multilocus sequence typing (MLST) loci, following an approach termed mini-MLST or Minim typing. Although the resolving power is not totally correspondent to MLST, the Simpson's Index of Diversity provided by HRM method typically >0.95. Furthermore, the cost of this approach is less than MLST, enabling low cost surveillance and rapid response for outbreak control. Hence, the potential of HRM technology can strongly facilitate routine research and diagnostics in the epidemiological studies, as well as advance and streamline the genetic characterization of bacterial pathogens.
Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi
2013-01-01
Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325
Xiao, Chao-Ting; Halbur, Patrick G; Opriessnig, Tanja
2015-07-01
The oldest porcine circovirus type 2 (PCV2) sequence dates back to 1962 and is among several hundreds of publicly available PCV2 sequences. Despite this resource, few studies have investigated the global genetic diversity of PCV2. To evaluate the phylogenetic relationship of PCV2 strains, 1680 PCV2 open reading frame 2 (ORF2) sequences were compared and analysed by methods of neighbour-joining, maximum-likelihood, Bayesian inference and network analysis. Four distinct clades were consistently identified and included PCV2a, PCV2b, PCV2c and PCV2d; the p-distance between PCV2d and PCV2b was 0.055±0.008, larger than the PCV2 genotype-definition cut-off of 0.035, supporting PCV2d as an independent genotype. Among the 1680 sequences, 278-285 (16.5-17 %) were classified as PCV2a, 1007-1058 (59.9-63 %) as PCV2b, three (0.2 %) as PCV2c and 322-323 (19.2 %) as PCV2d, with the remaining 12-78 sequences (0.7-4.6 %) classified as intermediate clades or strains by the various methods. Classification of strains to genotypes differed based on the number of sequences used for the analysis, indicating that sample size is important when determining classification and assessing PCV2 trends and shifts. PCV2d was initially identified in 1999 in samples collected in Switzerland, now appears to be widespread in China and has been present in North America since 2012. During 2012-2013, 37 % of all investigated PCV2 sequences from US pigs were classified as PCV2d and overall data analysis suggests an ongoing genotype shift from PCV2b towards PCV2d. The present analyses indicate that PCV2d emerged approximately 20 years ago.
Xu, X G; He, J; He, Y M; Tao, S D; Ying, Y L; Zhu, F M; Lv, H J; Yan, L X
2011-04-01
The Diego blood group system plays an important role in transfusion medicine. Genotyping of DI1 and DI2 alleles is helpful for the investigation into haemolytic disease of the newborn (HDN) and for the development of rare blood group databases. Here, we set up a polymerase chain reaction sequence-based typing (PCR-SBT) method for genotyping of Diego blood group alleles. Specific primers for exon 19 of the solute carrier family 4, anion exchanger, member1 (SLC4A1) gene were designed, and our PCR-SBT method was established and optimized for Diego genotyping. A total of 1053 samples from the Chinese Han population and the family members of a rare proband with DI1/DI1 genotype were investigated by the PCR-SBT method. An allele-specific primer PCR (PCR-ASP) was used to verify the reliability of the PCR-SBT method. The frequencies of DI1 and DI2 alleles in the Chinese Han population were 0.0247 and 0.9753, respectively. Six new single nucleotide polymorphisms (SNPs) were found in the sequenced regions of the SLC4A1 gene, and four novel SNPs located in the exon 19, in which one SNP could cause an amino acid alteration of Ala858Ser on erythrocyte anion exchanger protein 1. The genotypes for Diego blood group were identical among 41 selected samples with PCR-ASP and PCR-SBT. The PCR-SBT method can be used in Diego genotyping as a substitute of serological technique when the antisera is lacking and was suitable for screening large numbers of donors in rare blood group databases. © 2010 The Author(s). Vox Sanguinis © 2010 International Society of Blood Transfusion.
Radtke, Robert P; Stokes, Robert H; Glowka, David A
2014-12-02
A method for operating an impulsive type seismic energy source in a firing sequence having at least two actuations for each seismic impulse to be generated by the source. The actuations have a time delay between them related to a selected energy frequency peak of the source output. One example of the method is used for generating seismic signals in a wellbore and includes discharging electric current through a spark gap disposed in the wellbore in at least one firing sequence. The sequence includes at least two actuations of the spark gap separated by an amount of time selected to cause acoustic energy resulting from the actuations to have peak amplitude at a selected frequency.
Jiang, Haojun; Xie, Yifan; Li, Xuchao; Ge, Huijuan; Deng, Yongqiang; Mu, Haofang; Feng, Xiaoli; Yin, Lu; Du, Zhou; Chen, Fang; He, Nongyue
2016-01-01
Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels in order to verify the performance in clinical cases. Combining targeted deep sequencing of selective SNP and informative bioinformatics pipeline, we calculated the combined paternity index (CPI) of 17 cases to determine paternity. Sequencing-based NIPAT results fully agreed with invasive prenatal paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future.
Shin, Jeong Hong; Jung, Soobin; Ramakrishna, Suresh; Kim, Hyongbum Henry; Lee, Junwon
2018-07-07
Genome editing technology using programmable nucleases has rapidly evolved in recent years. The primary mechanism to achieve precise integration of a transgene is mainly based on homology-directed repair (HDR). However, an HDR-based genome-editing approach is less efficient than non-homologous end-joining (NHEJ). Recently, a microhomology-mediated end-joining (MMEJ)-based transgene integration approach was developed, showing feasibility both in vitro and in vivo. We expanded this method to achieve targeted sequence substitution (TSS) of mutated sequences with normal sequences using double-guide RNAs (gRNAs), and a donor template flanking the microhomologies and target sequence of the gRNAs in vitro and in vivo. Our method could realize more efficient sequence substitution than the HDR-based method in vitro using a reporter cell line, and led to the survival of a hereditary tyrosinemia mouse model in vivo. The proposed MMEJ-based TSS approach could provide a novel therapeutic strategy, in addition to HDR, to achieve gene correction from a mutated sequence to a normal sequence. Copyright © 2018 Elsevier Inc. All rights reserved.
Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D
2013-01-01
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L
2017-07-05
Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct ® , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Burns, Cara C; Kilpatrick, David R; Iber, Jane C; Chen, Qi; Kew, Olen M
2016-01-01
Virologic surveillance is essential to the success of the World Health Organization initiative to eradicate poliomyelitis. Molecular methods have been used to detect polioviruses in tissue culture isolates derived from stool samples obtained through surveillance for acute flaccid paralysis. This chapter describes the use of realtime PCR assays to identify and serotype polioviruses. In particular, a degenerate, inosine-containing, panpoliovirus (panPV) PCR primer set is used to distinguish polioviruses from NPEVs. The high degree of nucleotide sequence diversity among polioviruses presents a challenge to the systematic design of nucleic acid-based reagents. To accommodate the wide variability and rapid evolution of poliovirus genomes, degenerate codon positions on the template were matched to mixed-base or deoxyinosine residues on both the primers and the TaqMan™ probes. Additional assays distinguish between Sabin vaccine strains and non-Sabin strains. This chapter also describes the use of generic poliovirus specific primers, along with degenerate and inosine-containing primers, for routine VP1 sequencing of poliovirus isolates. These primers, along with nondegenerate serotype-specific Sabin primers, can also be used to sequence individual polioviruses in mixtures.
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.
Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V
2018-02-01
Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.
Patch-based frame interpolation for old films via the guidance of motion paths
NASA Astrophysics Data System (ADS)
Xia, Tianran; Ding, Youdong; Yu, Bing; Huang, Xi
2018-04-01
Due to improper preservation, traditional films will appear frame loss after digital. To deal with this problem, this paper presents a new adaptive patch-based method of frame interpolation via the guidance of motion paths. Our method is divided into three steps. Firstly, we compute motion paths between two reference frames using optical flow estimation. Then, the adaptive bidirectional interpolation with holes filled is applied to generate pre-intermediate frames. Finally, using patch match to interpolate intermediate frames with the most similar patches. Since the patch match is based on the pre-intermediate frames that contain the motion paths constraint, we show a natural and inartificial frame interpolation. We test different types of old film sequences and compare with other methods, the results prove that our method has a desired performance without hole or ghost effects.
Prediction of protein secondary structure content for the twilight zone sequences.
Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke
2007-11-15
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing.
Grissa, Ibtissem; Bouchon, Patrick; Pourcel, Christine; Vergnaud, Gilles
2008-04-01
The control of bacterial pathogens requires the development of tools allowing the precise identification of strains at the subspecies level. It is now widely accepted that these tools will need to be DNA-based assays (in contrast to identification at the species level, where biochemical based assays are still widely used, even though very powerful 16S DNA sequence databases exist). Typing assays need to be cheap and amenable to the designing of international databases. The success of such subspecies typing tools will eventually be measured by the size of the associated reference databases accessible over the internet. Three methods have shown some potential in this direction, the so-called spoligotyping assay (Mycobacterium tuberculosis, 40,000 entries database), Multiple Loci Sequence Typing (MLST; up to a few thousands entries for the more than 20 bacterial species), and more recently Multiple Loci VNTR Analysis (MLVA; up to a few hundred entries, assays available for more than 20 pathogens). In the present report we will review the current status of the tools and resources we have developed along the past seven years to help in the setting-up or the use of MLVA assays or lately for analysing Clustered Regularly Interspaced Short Palindromic Repeats called CRISPRs which are the basis for spoligotyping assays.
Yap, Kien-Pong; Ho, Wing S; Gan, Han M; Chai, Lay C; Thong, Kwai L
2016-01-01
Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus sequence typing (MLST) is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2) co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC, and tviD that may explain the variations that differentiate between seemingly successful (widespread) and unsuccessful (poor dissemination) S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure.
Whistler, Cheryl A; Hall, Jeffrey A; Xu, Feng; Ilyas, Saba; Siwakoti, Puskar; Cooper, Vaughn S; Jones, Stephen H
2015-06-01
Vibrio parahaemolyticus sequence type 36 (ST36) strains that are native to the Pacific Ocean have recently caused multistate outbreaks of gastroenteritis linked to shellfish harvested from the Atlantic Ocean. Whole-genome comparisons of 295 genomes of V. parahaemolyticus, including several traced to northeastern U.S. sources, were used to identify diagnostic loci, one putatively encoding an endonuclease (prp), and two others potentially conferring O-antigenic properties (cps and flp). The combination of all three loci was present in only one clade of closely related strains of ST36, ST59, and one additional unknown sequence type. However, each locus was also identified outside this clade, with prp and flp occurring in only two nonclade isolates and cps in four. Based on the distribution of these loci in sequenced genomes, prp identified clade strains with >99% accuracy, but the addition of one more locus increased accuracy to 100%. Oligonucleotide primers targeting prp and cps were combined in a multiplex PCR method that defines species using the tlh locus and determines the presence of both the tdh and trh hemolysin-encoding genes, which are also present in ST36. Application of the method in vitro to a collection of 94 clinical isolates collected over a 4-year period in three northeastern U.S. states and 87 environmental isolates revealed that the prp and cps amplicons were detected only in clinical isolates identified as belonging to the ST36 clade and in no environmental isolates from the region. The assay should improve detection and surveillance, thereby reducing infections. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Pietzka, Ariane T.; Stöger, Anna; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner
2011-01-01
The ability to accurately track Listeria monocytogenes strains involved in outbreaks is essential for control and prevention of listeriosis. Because current typing techniques are time-consuming, cost-intensive, technically demanding, and difficult to standardize, we developed a rapid and cost-effective method for typing of L. monocytogenes. In all, 172 clinical L. monocytogenes isolates and 20 isolates from culture collections were typed by high-resolution melting (HRM) curve analysis of a specific locus of the internalin B gene (inlB). All obtained HRM curve profiles were verified by sequence analysis. The 192 tested L. monocytogenes isolates yielded 15 specific HRM curve profiles. Sequence analysis revealed that these 15 HRM curve profiles correspond to 18 distinct inlB sequence types. The HRM curve profiles obtained correlated with the five phylogenetic groups I.1, I.2, II.1, II.2, and III. Thus, HRM curve analysis constitutes an inexpensive assay and represents an improvement in typing relative to classical serotyping or multiplex PCR typing protocols. This method provides a rapid and powerful screening tool for simultaneous preliminary typing of up to 384 samples in approximately 2 hours. PMID:21227395
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-08-01
RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Diversity of Cronobacter spp. isolates from the vegetables in the middle-east coastline of China.
Chen, Wanyi; Yang, Jielin; You, Chunping; Liu, Zhenmin
2016-06-01
Cronobacter spp. has caused life-threatening neonatal infections mainly resulted from consumption of contaminated powdered infant formula. A total of 102 vegetable samples from retail markets were evaluated for the presence of Cronobacter spp. Thirty-five presumptive Cronobacter isolates were isolated and identified using API 20E and 16S rDNA sequencing analyses. All isolates and type strains were characterized using enterobacterial repetitive intergenic consensus sequence PCR (ERIC-PCR), and genetic profiles of cluster analysis from this molecular typing test clearly showed that there were differences among isolates from different vegetables. A polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) based on the amplification of the gyrB gene (1258 bp) was developed to differentiate among Cronobacter species. A new PCR-RFLP assay based on the amplification of the gyrB gene using Alu I and Hinf I endonuclease combination is established and it has been confirmed an accurate and rapid subtyping method to differentiate Cronobacter species. Sequence analysis of the gyrB gene was proven to be suitable for the phylogenetic analysis of the Cronobacter strains, which has much better resolution based on SNPs in the identification of Cronobacter species specificity than PCR-RFLP and ERIC-PCR. Our study further confirmed that vegetables are one of the most common habitats or sources of Cronobacter spp. contamination in the middle-east coastline of China.
2013-01-01
Background An unusually high incidence of aseptic meningitis caused by enteroviruses was noted in Alberta, Canada between March and October 2010. Sequence based typing was performed on the enterovirus positive samples to gain a better understanding of the molecular characteristics of the Coxsackie A9 (CVA-9) strain responsible for most cases in this outbreak. Methods Molecular typing was performed by amplification and sequencing of the VP2 region. The genomic sequence of one of the 2010 outbreak isolates was compared to a CVA-9 isolate from 2003 and the prototype sequence to study genetic drift and recombination. Results Of the 4323 samples tested, 213 were positive for enteroviruses (4.93%). The majority of the positives were detected in CSF samples (n = 157, 73.71%) and 81.94% of the sequenced isolates were typed as CVA-9. The sequenced CVA-9 positives were predominantly (94.16%) detected in patients ranging in age from 15 to 29 years and the peak months for detection were between March and October. Full genome sequence comparisons revealed that the CVA-9 viruses isolated in Alberta in 2003 and 2010 were highly homologous to the prototype CVA-9 in the structural VP1, VP2 and VP3 regions but divergent in the VP4, non-structural and non-coding regions. Conclusion The increase in cases of aseptic meningitis was associated with enterovirus CVA-9. Sequence divergence between the prototype strain of CVA-9 and the Alberta isolates suggests genetic drifting and/or recombination events, however the sequence was conserved in the antigenic regions determined by the VP1, VP2 and VP3 genes. These results suggest that the increase in CVA-9 cases likely did not result from the emergence of a radically different immune escape mutant. PMID:23521862
On-Line Detection and Segmentation of Sports Motions Using a Wearable Sensor.
Kim, Woosuk; Kim, Myunggyu
2018-03-19
In sports motion analysis, observation is a prerequisite for understanding the quality of motions. This paper introduces a novel approach to detect and segment sports motions using a wearable sensor for supporting systematic observation. The main goal is, for convenient analysis, to automatically provide motion data, which are temporally classified according to the phase definition. For explicit segmentation, a motion model is defined as a sequence of sub-motions with boundary states. A sequence classifier based on deep neural networks is designed to detect sports motions from continuous sensor inputs. The evaluation on two types of motions (soccer kicking and two-handed ball throwing) verifies that the proposed method is successful for the accurate detection and segmentation of sports motions. By developing a sports motion analysis system using the motion model and the sequence classifier, we show that the proposed method is useful for observation of sports motions by automatically providing relevant motion data for analysis.
Enantiospecific recognition of DNA sequences by a proflavine Tröger base.
Bailly, C; Laine, W; Demeunynck, M; Lhomme, J
2000-07-05
The DNA interaction of a chiral Tröger base derived from proflavine was investigated by DNA melting temperature measurements and complementary biochemical assays. DNase I footprinting experiments demonstrate that the binding of the proflavine-based Tröger base is both enantio- and sequence-specific. The (+)-isomer poorly interacts with DNA in a non-sequence-selective fashion. In sharp contrast, the corresponding (-)-isomer recognizes preferentially certain DNA sequences containing both A. T and G. C base pairs, such as the motifs 5'-GTT. AAC and 5'-ATGA. TCAT. This is the first experimental demonstration that acridine-type Tröger bases can be used for enantiospecific recognition of DNA sequences. Copyright 2000 Academic Press.
Experimental design and quantitative analysis of microbial community multiomics.
Mallick, Himel; Ma, Siyuan; Franzosa, Eric A; Vatanen, Tommi; Morgan, Xochitl C; Huttenhower, Curtis
2017-11-30
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Shewmaker, P L; Whitney, A M; Humrighouse, B W
2016-03-01
Phenotypic, genotypic, and antimicrobial characteristics of six phenotypically distinct human clinical isolates that most closely resembled the type strain of Streptococcus halichoeri isolated from a seal are presented. Sequencing of the 16S rRNA, rpoB, sodA, and recN genes; comparative whole-genome analysis; conventional biochemical and Rapid ID 32 Strep identification methods; and antimicrobial susceptibility testing were performed on the human isolates, the type strain of S. halichoeri, and type strains of closely related species. The six human clinical isolates were biochemically indistinguishable from each other and showed 100% 16S rRNA, rpoB, sodA, and recN gene sequence similarity. Comparative 16S rRNA gene sequencing analysis revealed 98.6% similarity to S. halichoeri CCUG 48324(T), 97.9% similarity to S. canis ATCC 43496(T), and 97.8% similarity to S. ictaluri ATCC BAA-1300(T). A 3,530-bp fragment of the rpoB gene was 98.8% similar to the S. halichoeri type strain, 84.6% to the S. canis type strain, and 83.8% to the S. ictaluri type strain. The S. halichoeri type strain and the human clinical isolates were susceptible to the antimicrobials tested based on CLSI guidelines for Streptococcus species viridans group with the exception of tetracycline and erythromycin. The human isolates were phenotypically distinct from the type strain isolated from a seal; comparative whole-genome sequence analysis confirmed that the human isolates were S. halichoeri. On the basis of these results, a novel subspecies, Streptococcus halichoeri subsp. hominis, is proposed for the human isolates and Streptococcus halichoeri subsp. halichoeri is proposed for the gray seal isolates. The type strain of the novel subspecies is SS1844(T) = CCUG 67100(T) = LMG 28801(T). Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Guérin, Frédéric; Arnaiz, Olivier; Boggetto, Nicole; Denby Wilkes, Cyril; Meyer, Eric; Sperling, Linda; Duharcourt, Sandra
2017-04-26
DNA elimination is developmentally programmed in a wide variety of eukaryotes, including unicellular ciliates, and leads to the generation of distinct germline and somatic genomes. The ciliate Paramecium tetraurelia harbors two types of nuclei with different functions and genome structures. The transcriptionally inactive micronucleus contains the complete germline genome, while the somatic macronucleus contains a reduced genome streamlined for gene expression. During development of the somatic macronucleus, the germline genome undergoes massive and reproducible DNA elimination events. Availability of both the somatic and germline genomes is essential to examine the genome changes that occur during programmed DNA elimination and ultimately decipher the mechanisms underlying the specific removal of germline-limited sequences. We developed a novel experimental approach that uses flow cell imaging and flow cytometry to sort subpopulations of nuclei to high purity. We sorted vegetative micronuclei and macronuclei during development of P. tetraurelia. We validated the method by flow cell imaging and by high throughput DNA sequencing. Our work establishes the proof of principle that developing somatic macronuclei can be sorted from a complex biological sample to high purity based on their size, shape and DNA content. This method enabled us to sequence, for the first time, the germline DNA from pure micronuclei and to identify novel transposable elements. Sequencing the germline DNA confirms that the Pgm domesticated transposase is required for the excision of all ~45,000 Internal Eliminated Sequences. Comparison of the germline DNA and unrearranged DNA obtained from PGM-silenced cells reveals that the latter does not provide a faithful representation of the germline genome. We developed a flow cytometry-based method to purify P. tetraurelia nuclei to high purity and provided quality control with flow cell imaging and high throughput DNA sequencing. We identified 61 germline transposable elements including the first Paramecium retrotransposons. This approach paves the way to sequence the germline genomes of P. aurelia sibling species for future comparative genomic studies.
Peng, Cheng; Wang, Hua; Xu, Xiaoli; Wang, Xiaofu; Chen, Xiaoyun; Wei, Wei; Lai, Yongmin; Liu, Guoquan; Godwin, Ian Douglas; Li, Jieqin; Zhang, Ling; Xu, Junfeng
2018-05-15
Gene editing techniques are becoming powerful tools for modifying target genes in organisms. Although several methods have been developed to detect gene-edited organisms, these techniques are time and labour intensive. Meanwhile, few studies have investigated high-throughput detection and screening strategies for plants modified by gene editing. In this study, we developed a simple, sensitive and high-throughput quantitative real-time (qPCR)-based method. The qPCR-based method exploits two differently labelled probes that are placed within one amplicon at the gene editing target site to simultaneously detect the wild-type and a gene-edited mutant. We showed that the qPCR-based method can accurately distinguish CRISPR/Cas9-induced mutants from the wild-type in several different plant species, such as Oryza sativa, Arabidopsis thaliana, Sorghum bicolor, and Zea mays. Moreover, the method can subsequently determine the mutation type by direct sequencing of the qPCR products of mutations due to gene editing. The qPCR-based method is also sufficiently sensitive to distinguish between heterozygous and homozygous mutations in T 0 transgenic plants. In a 384-well plate format, the method enabled the simultaneous analysis of up to 128 samples in three replicates without handling the post-polymerase chain reaction (PCR) products. Thus, we propose that our method is an ideal choice for screening plants modified by gene editing from many candidates in T 0 transgenic plants, which will be widely used in the area of plant gene editing. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.
Mars, Mokhtar; Bouaziz, Mouna; Tbini, Zeineb; Ladeb, Fethi; Gharbi, Souha
2018-06-12
This study aims to determine how Magnetic Resonance Imaging (MRI) acquisition techniques and calculation methods affect T2 values of knee cartilage at 1.5 Tesla and to identify sequences that can be used for high-resolution T2 mapping in short scanning times. This study was performed on phantom and twenty-nine patients who underwent MRI of the knee joint at 1.5 Tesla. The protocol includes T2 mapping sequences based on Single Echo Spin Echo (SESE), Multi-Echo Spin Echo (MESE), Fast Spin Echo (FSE) and Turbo Gradient Spin Echo (TGSE). The T2 relaxation times were quantified and evaluated using three calculation methods (MapIt, Syngo Offline and monoexponential fit). Signal to Noise Ratios (SNR) were measured in all sequences. All statistical analyses were performed using the t-test. The average T2 values in phantom were 41.7 ± 13.8 ms for SESE, 43.2 ± 14.4 ms for MESE, 42.4 ± 14.1 ms for FSE and 44 ± 14.5 ms for TGSE. In the patient study, the mean differences were 6.5 ± 8.2 ms, 7.8 ± 7.6 ms and 8.4 ± 14.2 ms for MESE, FSE and TGSE compared to SESE respectively; these statistical results were not significantly different (p > 0.05). The comparison between the three calculation methods showed no significant difference (p > 0.05). t-Test showed no significant difference between SNR values for all sequences. T2 values depend not only on the sequence type but also on the calculation method. None of the sequences revealed significant differences compared to the SESE reference sequence. TGSE with its short scanning time can be used for high-resolution T2 mapping. ©2018The Author(s). Published by S. Karger AG, Basel.
How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives.
Dal Molin, Alessandra; Di Camillo, Barbara
2018-01-31
The sequencing of the transcriptome of single cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types in heterogeneous cell populations or for the study of stochastic gene expression. In recent years, various experimental methods and computational tools for analysing single-cell RNA-sequencing data have been proposed. However, most of them are tailored to different experimental designs or biological questions, and in many cases, their performance has not been benchmarked yet, thus increasing the difficulty for a researcher to choose the optimal single-cell transcriptome sequencing (scRNA-seq) experiment and analysis workflow. In this review, we aim to provide an overview of the current available experimental and computational methods developed to handle single-cell RNA-sequencing data and, based on their peculiarities, we suggest possible analysis frameworks depending on specific experimental designs. Together, we propose an evaluation of challenges and open questions and future perspectives in the field. In particular, we go through the different steps of scRNA-seq experimental protocols such as cell isolation, messenger RNA capture, reverse transcription, amplification and use of quantitative standards such as spike-ins and Unique Molecular Identifiers (UMIs). We then analyse the current methodological challenges related to preprocessing, alignment, quantification, normalization, batch effect correction and methods to control for confounding effects. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
[Sequence-based typing of enviromental Legionella pneumophila isolates in Guangzhou].
Zhang, Ying; Qu, Pinghua; Zhang, Jian; Chen, Shouyi
2011-03-01
To characterize the genes of Legionella pneumophila isolated from different water source in Guangzhou from 2006 to 2009. To genotype the strains by using sequence-based typing (SBT) scheme. In total 44 L. pneumophila strains were identified by SBT with 7 diversifying genes of flaA, asd, mip, pilE, mompS, proA and neuA. Analysis of the amplicons sequence was taken in the European Working Group for Legionella Infections (EWGLI) international SBT database to obtain the allelic profiles and sequence types (STs). Serogroups were typed by latex agglutination test. Data from SBT revealed a high diversity among the strains and ST01 accounts for 30% (13/ 44). Fifteen new STs were discovered from 20 STs and 2 of them were newly assigned (ST887 and ST888) by EWGLI. SBT Phylogenetic tree was generated by SplitsTree and BURST programs. High diversity and specificity were observed of the L. pneumophila strains in Guangzhou. SBT is useful for L. pneumophila genomic study and epidemiological surveillance.
Blind multirigid retrospective motion correction of MR images.
Loktyushin, Alexander; Nickisch, Hannes; Pohmann, Rolf; Schölkopf, Bernhard
2015-04-01
Physiological nonrigid motion is inevitable when imaging, e.g., abdominal viscera, and can lead to serious deterioration of the image quality. Prospective techniques for motion correction can handle only special types of nonrigid motion, as they only allow global correction. Retrospective methods developed so far need guidance from navigator sequences or external sensors. We propose a fully retrospective nonrigid motion correction scheme that only needs raw data as an input. Our method is based on a forward model that describes the effects of nonrigid motion by partitioning the image into patches with locally rigid motion. Using this forward model, we construct an objective function that we can optimize with respect to both unknown motion parameters per patch and the underlying sharp image. We evaluate our method on both synthetic and real data in 2D and 3D. In vivo data was acquired using standard imaging sequences. The correction algorithm significantly improves the image quality. Our compute unified device architecture (CUDA)-enabled graphic processing unit implementation ensures feasible computation times. The presented technique is the first computationally feasible retrospective method that uses the raw data of standard imaging sequences, and allows to correct for nonrigid motion without guidance from external motion sensors. © 2014 Wiley Periodicals, Inc.
Neugebauer, Tomasz; Bordeleau, Eric; Burrus, Vincent; Brzezinski, Ryszard
2015-01-01
Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.
An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.
Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J
2014-11-01
The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.
2016-06-01
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.
2016-01-01
Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631
Church, George M.; Kieffer-Higgins, Stephen
1992-01-01
This invention features vectors and a method for sequencing DNA. The method includes the steps of: a) ligating the DNA into a vector comprising a tag sequence, the tag sequence includes at least 15 bases, wherein the tag sequence will not hybridize to the DNA under stringent hybridization conditions and is unique in the vector, to form a hybrid vector, b) treating the hybrid vector in a plurality of vessels to produce fragments comprising the tag sequence, wherein the fragments differ in length and terminate at a fixed known base or bases, wherein the fixed known base or bases differs in each vessel, c) separating the fragments from each vessel according to their size, d) hybridizing the fragments with an oligonucleotide able to hybridize specifically with the tag sequence, and e) detecting the pattern of hybridization of the tag sequence, wherein the pattern reflects the nucleotide sequence of the DNA.
Application of 2D graphic representation of protein sequence based on Huffman tree method.
Qi, Zhao-Hui; Feng, Jun; Qi, Xiao-Qin; Li, Ling
2012-05-01
Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This representation can completely avoid loss of information in the transfer of data from a protein sequence to its graphic representation. The method consists of two parts. One is about the 0-1 codes of 20 amino acids by Huffman tree with amino acid frequency. The amino acid frequency is defined as the statistical number of an amino acid in the analyzed protein sequences. The other is about the 2D graphic representation of protein sequence based on the 0-1 codes. Then the applications of the method on ten ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed model may provide us with some new sights to understand the evolution patterns determined from protein sequences and complete genomes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Takeo, Toshinori; Tanaka, Tetsuya; Matsubayashi, Makoto; Maeda, Hiroki; Kusakisako, Kodai; Matsui, Toshihiro; Mochizuki, Masami; Matsuo, Tomohide
2014-08-01
Previously, we characterized an undocumented strain of Eimeria krijgsmanni by morphological and biological features. Here, we present a detailed molecular phylogenetic analysis of this organism. Namely, 18S ribosomal RNA gene (rDNA) sequences of E. krijgsmanni were analyzed to incorporate this species into a comprehensive Eimeria phylogeny. As a result, partial 18S rDNA sequence from E. krijgsmanni was successfully determined, and two different types, Type A and Type B, that differed by 1 base pair were identified. E. krijgsmanni was originally isolated from a single oocyst, and thus the result show that the two types might have allelic sequence heterogeneity in the 18S rDNA. Based on phylogenetic analyses, the two types of E. krijgsmanni 18S rDNA formed one of two clades among murine Eimeria spp.; these Eimeria clades reflected morphological similarity among the Eimeria spp. This is the third molecular phylogenetic characterization of a murine Eimeria spp. in addition to E. falciformis and E. papillata. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Kinnevey, Peter M.; Shore, Anna C.; Brennan, Grainne I.; Sullivan, Derek J.; Ehricht, Ralf; Monecke, Stefan; Slickers, Peter
2013-01-01
Methicillin-resistant Staphylococcus aureus (MRSA) has been a major cause of nosocomial infection in Irish hospitals for 4 decades, and replacement of predominant MRSA clones has occurred several times. An MRSA isolate recovered in 2006 as part of a larger study of sporadic MRSA exhibited a rare spa (t878) and multilocus sequence (ST779) type and was nontypeable by PCR- and DNA microarray-based staphylococcal cassette chromosome mec (SCCmec) element typing. Whole-genome sequencing revealed the presence of a novel 51-kb composite island (CI) element with three distinct domains, each flanked by direct repeat and inverted repeat sequences, including (i) a pseudo SCCmec element (16.3 kb) carrying mecA with a novel mec class region, a fusidic acid resistance gene (fusC), and two copper resistance genes (copB and copC) but lacking ccr genes; (ii) an SCC element (17.5 kb) carrying a novel ccrAB4 allele; and (iii) an SCC element (17.4 kb) carrying a novel ccrC allele and a clustered regularly interspaced short palindromic repeat (CRISPR) region. The novel CI was subsequently identified by PCR in an additional 13 t878/ST779 MRSA isolates, six from bloodstream infections, recovered between 2006 and 2011 in 11 hospitals. Analysis of open reading frames (ORFs) carried by the CI showed amino acid sequence similarity of 44 to 100% to ORFs from S. aureus and coagulase-negative staphylococci (CoNS). These findings provide further evidence of genetic transfer between S. aureus and CoNS and show how this contributes to the emergence of novel SCCmec elements and MRSA strains. Ongoing surveillance of this MRSA strain is warranted and will require updating of currently used SCCmec typing methods. PMID:23147725
Peeters, Charlotte; Meier-Kolthoff, Jan P.; Verheyde, Bart; De Brandt, Evie; Cooper, Vaughn S.; Vandamme, Peter
2016-01-01
Partial gyrB gene sequence analysis of 17 isolates from human and environmental sources revealed 13 clusters of strains and identified them as Burkholderia glathei clade (BGC) bacteria. The taxonomic status of these clusters was examined by whole-genome sequence analysis, determination of the G+C content, whole-cell fatty acid analysis and biochemical characterization. The whole-genome sequence-based phylogeny was assessed using the Genome Blast Distance Phylogeny (GBDP) method and an extended multilocus sequence analysis (MLSA) approach. The results demonstrated that these 17 BGC isolates represented 13 novel Burkholderia species that could be distinguished by both genotypic and phenotypic characteristics. BGC strains exhibited a broad metabolic versatility and developed beneficial, symbiotic, and pathogenic interactions with different hosts. Our data also confirmed that there is no phylogenetic subdivision in the genus Burkholderia that distinguishes beneficial from pathogenic strains. We therefore propose to formally classify the 13 novel BGC Burkholderia species as Burkholderia arvi sp. nov. (type strain LMG 29317T = CCUG 68412T), Burkholderia hypogeia sp. nov. (type strain LMG 29322T = CCUG 68407T), Burkholderia ptereochthonis sp. nov. (type strain LMG 29326T = CCUG 68403T), Burkholderia glebae sp. nov. (type strain LMG 29325T = CCUG 68404T), Burkholderia pedi sp. nov. (type strain LMG 29323T = CCUG 68406T), Burkholderia arationis sp. nov. (type strain LMG 29324T = CCUG 68405T), Burkholderia fortuita sp. nov. (type strain LMG 29320T = CCUG 68409T), Burkholderia temeraria sp. nov. (type strain LMG 29319T = CCUG 68410T), Burkholderia calidae sp. nov. (type strain LMG 29321T = CCUG 68408T), Burkholderia concitans sp. nov. (type strain LMG 29315T = CCUG 68414T), Burkholderia turbans sp. nov. (type strain LMG 29316T = CCUG 68413T), Burkholderia catudaia sp. nov. (type strain LMG 29318T = CCUG 68411T) and Burkholderia peredens sp. nov. (type strain LMG 29314T = CCUG 68415T). Furthermore, we present emended descriptions of the species Burkholderia sordidicola, Burkholderia zhejiangensis and Burkholderia grimmiae. The GenBank/EMBL/DDBJ accession numbers for the 16S rRNA and gyrB gene sequences determined in this study are LT158612-LT158624 and LT158625-LT158641, respectively. PMID:27375597
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Szatkiewicz, Jin P; Wang, WeiBo; Sullivan, Patrick F; Wang, Wei; Sun, Wei
2013-02-01
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth-based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth-based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Ligozzi, Marco; Fontana, Roberta; Aldegheri, Marco; Scalet, Giovanna; Lo Cascio, Giuliana
2010-05-01
A semiautomated, repetitive-sequence-based PCR (rep-PCR) instrument (DiversiLab system) was evaluated in comparison with pulsed-field gel electrophoresis (PFGE) to investigate an outbreak of Serratia marcescens infections in a neonatal intensive care unit (NICU). A selection of 36 epidemiologically related and 8 epidemiologically unrelated isolates was analyzed. Among the epidemiologically related isolates, PFGE identified five genetically unrelated patterns. Thirty-two isolates from patients and wet nurses showed the same PFGE profile (pattern A). Genetically unrelated PFGE patterns were found in one patient (pattern B), in two wet nurses (patterns C and D), and in an environmental isolate from the NICU (pattern G). Rep-PCR identified seven different patterns, three of which included the 32 isolates of PFGE type A. One or two band differences in isolates of these three types allowed isolates to be categorized as similar and included in a unique cluster. Isolates of different PFGE types were also of unrelated rep-PCR types. All of the epidemiologically unrelated isolates were of different PFGE and rep-PCR types. The level of discrimination exhibited by rep-PCR with the DiversiLab system allowed us to conclude that this method was able to identify genetic similarity in a spatio-temporal cluster of S. marcescens isolates.
Yourshaw, Michael; Solorzano-Vargas, R. Sergio; Pickett, Lindsay A.; Lindberg, Iris; Wang, Jiafang; Cortina, Galen; Pawlikowska-Haddal, Anna; Baron, Howard; Venick, Robert S.; Nelson, Stanley F.; Martín, Martín G.
2014-01-01
Objectives Congenital diarrhea disorders are a group of genetically diverse and typically autosomal recessive disorders that have yet to be well characterized phenotypically or molecularly. Diagnostic assessments are generally limited to nutritional challenges and histologic evaluation, and many subjects eventually require a prolonged course of intravenous nutrition. Here we describe next-generation sequencing techniques to investigate a child with perplexing congenital malabsorptive diarrhea and other presumably unrelated clinical problems; this method provides an alternative approach to molecular diagnosis. Methods We screened the diploid genome of an affected individual, using exome sequencing, for uncommon variants that have observed protein-coding consequences. We assessed the functional activity of the mutant protein, as well as its lack of expression using immunohistochemistry. Results Among several rare variants detected was a homozygous nonsense mutation in the catalytic domain of the proprotein convertase subtilisin/kexin type 1 gene. The mutation abolishes prohormone convertase 1/3 endoprotease activity as well as expression in the intestine. These primary genetic findings prompted a careful endocrine reevaluation of the child at 4.5 years of age, and multiple significant problems were subsequently identified consistent with the known phenotypic consequences of proprotein convertase subtilisin/kexin type 1 (PCSK1) gene mutations. Based on the molecular diagnosis, alternate medical and dietary management was implemented for diabetes insipidus, polyphagia, and micropenis. Conclusions Whole-exome sequencing provides a powerful diagnostic tool to clinicians managing rare genetic disorders with multiple perplexing clinical manifestations. PMID:24280991
DOE Office of Scientific and Technical Information (OSTI.GOV)
Passarge, M; Fix, M K; Manser, P
Purpose: To create and test an accurate EPID-frame-based VMAT QA metric to detect gross dose errors in real-time and to provide information about the source of error. Methods: A Swiss cheese model was created for an EPID-based real-time QA process. The system compares a treatmentplan- based reference set of EPID images with images acquired over each 2° gantry angle interval. The metric utilizes a sequence of independent consecutively executed error detection Methods: a masking technique that verifies infield radiation delivery and ensures no out-of-field radiation; output normalization checks at two different stages; global image alignment to quantify rotation, scaling andmore » translation; standard gamma evaluation (3%, 3 mm) and pixel intensity deviation checks including and excluding high dose gradient regions. Tolerances for each test were determined. For algorithm testing, twelve different types of errors were selected to modify the original plan. Corresponding predictions for each test case were generated, which included measurement-based noise. Each test case was run multiple times (with different noise per run) to assess the ability to detect introduced errors. Results: Averaged over five test runs, 99.1% of all plan variations that resulted in patient dose errors were detected within 2° and 100% within 4° (∼1% of patient dose delivery). Including cases that led to slightly modified but clinically equivalent plans, 91.5% were detected by the system within 2°. Based on the type of method that detected the error, determination of error sources was achieved. Conclusion: An EPID-based during-treatment error detection system for VMAT deliveries was successfully designed and tested. The system utilizes a sequence of methods to identify and prevent gross treatment delivery errors. The system was inspected for robustness with realistic noise variations, demonstrating that it has the potential to detect a large majority of errors in real-time and indicate the error source. J. V. Siebers receives funding support from Varian Medical Systems.« less
Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics
NASA Astrophysics Data System (ADS)
Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.
2003-03-01
Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.
You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng
2018-06-06
As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.
Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.
2014-01-01
Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all suspected VTEC isolates. During a 7-week period in the fall of 2012, all incoming isolates were concurrently subjected to WGS using IonTorrent PGM. Real-time bioinformatics analysis was performed using web-tools (www.genomicepidemiology.org) for species determination, multilocus sequence type (MLST) typing, and determination of phylogenetic relationship, and a specific VirulenceFinder for detection of E. coli virulence genes was developed as part of this study. In total, 46 suspected VTEC isolates were characterized in parallel during the study. VirulenceFinder proved successful in detecting virulence genes included in routine typing, explicitly verocytotoxin 1 (vtx1), verocytotoxin 2 (vtx2), and intimin (eae), and also detected additional virulence genes. VirulenceFinder is also a robust method for assigning verocytotoxin (vtx) subtypes. A real-time clustering of isolates in agreement with the epidemiology was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing and surveillance of other pathogens. PMID:24574290
Method for rapid base sequencing in DNA and RNA with two base labeling
Jett, J.H.; Keller, R.A.; Martin, J.C.; Posner, R.G.; Marrone, B.L.; Hammond, M.L.; Simpson, D.J.
1995-04-11
A method is described for rapid-base sequencing in DNA and RNA with two-base labeling and employing fluorescent detection of single molecules at two wavelengths. Bases modified to accept fluorescent labels are used to replicate a single DNA or RNA strand to be sequenced. The bases are then sequentially cleaved from the replicated strand, excited with a chosen spectrum of electromagnetic radiation, and the fluorescence from individual, tagged bases detected in the order of cleavage from the strand. 4 figures.
Method for rapid base sequencing in DNA and RNA with two base labeling
Jett, James H.; Keller, Richard A.; Martin, John C.; Posner, Richard G.; Marrone, Babetta L.; Hammond, Mark L.; Simpson, Daniel J.
1995-01-01
Method for rapid-base sequencing in DNA and RNA with two-base labeling and employing fluorescent detection of single molecules at two wavelengths. Bases modified to accept fluorescent labels are used to replicate a single DNA or RNA strand to be sequenced. The bases are then sequentially cleaved from the replicated strand, excited with a chosen spectrum of electromagnetic radiation, and the fluorescence from individual, tagged bases detected in the order of cleavage from the strand.
Genomic Repeat Abundances Contain Phylogenetic Signal
Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.
2015-01-01
A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464
USDA-ARS?s Scientific Manuscript database
Rice seeds of the temperate japonica cultivar Kitaake were mutagenized with sodium azide alone and in combination with methyl nitrosourea. Using the reduced representation sequencing method Restriction Enzyme Sequence Comparative Analysis (RESCAN), the mutation densities, types and local sequence co...
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.
Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing
2016-08-24
Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.
Single-Molecule Counting of Point Mutations by Transient DNA Binding
NASA Astrophysics Data System (ADS)
Su, Xin; Li, Lidan; Wang, Shanshan; Hao, Dandan; Wang, Lei; Yu, Changyuan
2017-03-01
High-confidence detection of point mutations is important for disease diagnosis and clinical practice. Hybridization probes are extensively used, but are hindered by their poor single-nucleotide selectivity. Shortening the length of DNA hybridization probes weakens the stability of the probe-target duplex, leading to transient binding between complementary sequences. The kinetics of probe-target binding events are highly dependent on the number of complementary base pairs. Here, we present a single-molecule assay for point mutation detection based on transient DNA binding and use of total internal reflection fluorescence microscopy. Statistical analysis of single-molecule kinetics enabled us to effectively discriminate between wild type DNA sequences and single-nucleotide variants at the single-molecule level. A higher single-nucleotide discrimination is achieved than in our previous work by optimizing the assay conditions, which is guided by statistical modeling of kinetics with a gamma distribution. The KRAS c.34 A mutation can be clearly differentiated from the wild type sequence (KRAS c.34 G) at a relative abundance as low as 0.01% mutant to WT. To demonstrate the feasibility of this method for analysis of clinically relevant biological samples, we used this technology to detect mutations in single-stranded DNA generated from asymmetric RT-PCR of mRNA from two cancer cell lines.
Van Kreijl, C F; Bos, J L
1977-01-01
The repeating nucleotide sequence of 68 base pairs in the mtDNA from an ethidium-induced cytoplasmic petite mutant of yeast has been determined. For sequence analysis specifically primed and terminated RNA copies, obtained by in vitro transcription of the separated strands, were use. The sequence consists of 66 consecutive AT base pairs flanked by two GC pairs and comprises nearly all of the mutant mitochondrial genome. The sequence, moreover, also represents the first part of wild-type mtDNA sequence so far. Images PMID:198740
Googling DNA sequences on the World Wide Web.
Hajibabaei, Mehrdad; Singer, Gregory A C
2009-11-10
New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
SvABA: genome-wide detection of structural variants and indels by local assembly.
Wala, Jeremiah A; Bandopadhayay, Pratiti; Greenwald, Noah F; O'Rourke, Ryan; Sharpe, Ted; Stewart, Chip; Schumacher, Steve; Li, Yilong; Weischenfeldt, Joachim; Yao, Xiaotong; Nusbaum, Chad; Campbell, Peter; Getz, Gad; Meyerson, Matthew; Zhang, Cheng-Zhong; Imielinski, Marcin; Beroukhim, Rameen
2018-04-01
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs. © 2018 Wala et al.; Published by Cold Spring Harbor Laboratory Press.
Wu, Kai; Liu, Jing; Wang, Shuai
2016-01-01
Evolutionary games (EG) model a common type of interactions in various complex, networked, natural and social systems. Given such a system with only profit sequences being available, reconstructing the interacting structure of EG networks is fundamental to understand and control its collective dynamics. Existing approaches used to handle this problem, such as the lasso, a convex optimization method, need a user-defined constant to control the tradeoff between the natural sparsity of networks and measurement error (the difference between observed data and simulated data). However, a shortcoming of these approaches is that it is not easy to determine these key parameters which can maximize the performance. In contrast to these approaches, we first model the EG network reconstruction problem as a multiobjective optimization problem (MOP), and then develop a framework which involves multiobjective evolutionary algorithm (MOEA), followed by solution selection based on knee regions, termed as MOEANet, to solve this MOP. We also design an effective initialization operator based on the lasso for MOEA. We apply the proposed method to reconstruct various types of synthetic and real-world networks, and the results show that our approach is effective to avoid the above parameter selecting problem and can reconstruct EG networks with high accuracy. PMID:27886244
NASA Astrophysics Data System (ADS)
Wu, Kai; Liu, Jing; Wang, Shuai
2016-11-01
Evolutionary games (EG) model a common type of interactions in various complex, networked, natural and social systems. Given such a system with only profit sequences being available, reconstructing the interacting structure of EG networks is fundamental to understand and control its collective dynamics. Existing approaches used to handle this problem, such as the lasso, a convex optimization method, need a user-defined constant to control the tradeoff between the natural sparsity of networks and measurement error (the difference between observed data and simulated data). However, a shortcoming of these approaches is that it is not easy to determine these key parameters which can maximize the performance. In contrast to these approaches, we first model the EG network reconstruction problem as a multiobjective optimization problem (MOP), and then develop a framework which involves multiobjective evolutionary algorithm (MOEA), followed by solution selection based on knee regions, termed as MOEANet, to solve this MOP. We also design an effective initialization operator based on the lasso for MOEA. We apply the proposed method to reconstruct various types of synthetic and real-world networks, and the results show that our approach is effective to avoid the above parameter selecting problem and can reconstruct EG networks with high accuracy.
DNA Base-Calling from a Nanopore Using a Viterbi Algorithm
Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei
2012-01-01
Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (∼98%), even with a poor signal/noise ratio. PMID:22677395
Hwang, Hwan-Su; Lee, Hyoshin; Choi, Yong Eui
2015-03-14
Eleutherococcus senticosus, Siberian ginseng, is a highly valued woody medicinal plant belonging to the family Araliaceae. E. senticosus produces a rich variety of saponins such as oleanane-type, noroleanane-type, 29-hydroxyoleanan-type, and lupane-type saponins. Genomic or transcriptomic approaches have not been used to investigate the saponin biosynthetic pathway in this plant. In this study, de novo sequencing was performed to select candidate genes involved in the saponin biosynthetic pathway. A half-plate 454 pyrosequencing run produced 627,923 high-quality reads with an average sequence length of 422 bases. De novo assembly generated 72,811 unique sequences, including 15,217 contigs and 57,594 singletons. Approximately 48,300 (66.3%) unique sequences were annotated using BLAST similarity searches. All of the mevalonate pathway genes for saponin biosynthesis starting from acetyl-CoA were isolated. Moreover, 206 reads of cytochrome P450 (CYP) and 145 reads of uridine diphosphate glycosyltransferase (UGT) sequences were isolated. Based on methyl jasmonate (MeJA) treatment and real-time PCR (qPCR) analysis, 3 CYPs and 3 UGTs were finally selected as candidate genes involved in the saponin biosynthetic pathway. The identified sequences associated with saponin biosynthesis will facilitate the study of the functional genomics of saponin biosynthesis and genetic engineering of E. senticosus.
Chakrapani, Sunil Kishore; Barnard, Daniel J; Dayal, Vinay
2016-05-01
This paper presents the study of influence of laminate sequence and fabric type on the baseline acoustic nonlinearity of fiber-reinforced composites. Nonlinear elastic wave techniques are increasingly becoming popular in detecting damage in composite materials. It was earlier observed by the authors that the non-classical nonlinear response of fiber-reinforced composite is influenced by the fiber orientation [Chakrapani, Barnard, and Dayal, J. Acoust. Soc. Am. 137(2), 617-624 (2015)]. The current study expands this effort to investigate the effect of laminate sequence and fabric type on the non-classical nonlinear response. Two hypotheses were developed using the previous results, and the theory of interlaminar stresses to investigate the influence of laminate sequence and fabric type. Each hypothesis was tested by capturing the nonlinear response by performing nonlinear resonance spectroscopy and measuring frequency shifts, loss factors, and higher harmonics. It was observed that the laminate sequence can either increase or decrease the nonlinear response based on the stacking sequence. Similarly, tests were performed to compare unidirectional fabric and woven fabric and it was observed that woven fabric exhibited a lower nonlinear response compared to the unidirectional fabric. Conjectures based on the matrix properties and interlaminar stresses were used in an attempt to explain the observed nonlinear responses for different configurations.
Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B.; Ben-Hur, Asa; Boucher, Christina
2016-01-01
Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The P class contains tandem P-type motif sequences, and the PLS class contains alternating P, L and S type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a PLS-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the PLS class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for PLS-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805