Sample records for dna sequence classification

  1. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  3. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. An Evolutionary Classification of Genomic Function

    PubMed Central

    Graur, Dan; Zheng, Yichen; Azevedo, Ricardo B.R.

    2015-01-01

    The pronouncements of the ENCODE Project Consortium regarding “junk DNA” exposed the need for an evolutionary classification of genomic elements according to their selected-effect function. In the classification scheme presented here, we divide the genome into “functional DNA,” that is, DNA sequences that have a selected-effect function, and “rubbish DNA,” that is, sequences that do not. Functional DNA is further subdivided into “literal DNA” and “indifferent DNA.” In literal DNA, the order of nucleotides is under selection; in indifferent DNA, only the presence or absence of the sequence is under selection. Rubbish DNA is further subdivided into “junk DNA” and “garbage DNA.” Junk DNA neither contributes to nor detracts from the fitness of the organism and, hence, evolves under selective neutrality. Garbage DNA, on the other hand, decreases the fitness of its carriers. Garbage DNA exists in the genome only because natural selection is neither omnipotent nor instantaneous. Each of these four functional categories can be 1) transcribed and translated, 2) transcribed but not translated, or 3) not transcribed. The affiliation of a DNA segment to a particular functional category may change during evolution: Functional DNA may become junk DNA, junk DNA may become garbage DNA, rubbish DNA may become functional DNA, and so on; however, determining the functionality or nonfunctionality of a genomic sequence must be based on its present status rather than on its potential to change (or not to change) in the future. Changes in functional affiliation are divided into pseudogenes, Lazarus DNA, zombie DNA, and Jekyll-to-Hyde DNA. PMID:25635041

  5. Probabilistic topic modeling for the analysis and classification of genomic sequences

    PubMed Central

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  6. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

    PubMed

    Xie, Guosen; Mo, Zhongxi

    2011-01-21

    In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species. Copyright © 2010 Elsevier Ltd. All rights reserved.

  7. Genomics dataset on unclassified published organism (patent US 7547531).

    PubMed

    Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier

    2016-12-01

    Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

  8. Identification of food and beverage spoilage yeasts from DNA sequence analyses

    USDA-ARS?s Scientific Manuscript database

    Detection, identification, and classification of yeasts has undergone a major transformation in the last decade and a half following application of gene sequence analyses and genome comparisons. Development of a database (barcode) of easily determined DNA sequences from domains 1 and 2 (D1/D2) of th...

  9. Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

    PubMed Central

    Porter, Teresita M.; Golding, G. Brian

    2012-01-01

    Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215

  10. Promoter Sequences Prediction Using Relational Association Rule Mining

    PubMed Central

    Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely

    2012-01-01

    In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233

  11. Classification and phylogeny of sika deer (Cervus nippon) subspecies based on the mitochondrial control region DNA sequence using an extended sample set.

    PubMed

    Ba, Hengxing; Yang, Fuhe; Xing, Xiumei; Li, Chunyi

    2015-06-01

    To further refine the classification and phylogeny of sika deer subspecies, the well-annotated sequences of the complete mitochondrial DNA (mtDNA) control region of 13 sika deer subspecies from GenBank were downloaded, aligned and analyzed in this study. By reconstructing the phylogenetic tree with an extended sample set, the results revealed a split between Northern and Southern Mainland Asia/Taiwan lineages, and moreover, two subspecies, C.n.mantchuricus and C.n.hortulorum, were existed in Northern Mainland Asia. Unexpectedly, Dybowskii's sika deer that was thought to originate from Northern Mainland Asia joins the Southern Mainland Asia/Taiwan lineage. The genetic divergences were ranged from 2.1% to 4.7% between Dybowskii's sika deer and all the other established subspecies at the mtDNA sequence level, which suggests that the maternal lineage of uncertain sika subspecies in Europe had been maintained until today. This study also provides a better understanding for the classification, phylogeny and phylogeographic history of sika deer subspecies.

  12. Recognising promoter sequences using an artificial immune system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cooke, D.E.; Hunt, J.E.

    1995-12-31

    We have developed an artificial immune system (AIS) which is based on the human immune system. The AIS possesses an adaptive learning mechanism which enables antibodies to emerge which can be used for classification tasks. In this paper, we describe how the AIS has been used to evolve antibodies which can classify promoter containing and promoter negative DNA sequences. The DNA sequences used for teaching were 57 nucleotides in length and contained procaryotic promoters. The system classified previously unseen DNA sequences with an accuracy of approximately 90%.

  13. Re-visiting protein-centric two-tier classification of existing DNA-protein complexes

    PubMed Central

    2012-01-01

    Background Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. Results On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Conclusions Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information. PMID:22800292

  14. Re-visiting protein-centric two-tier classification of existing DNA-protein complexes.

    PubMed

    Malhotra, Sony; Sowdhamini, Ramanathan

    2012-07-16

    Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information.

  15. Classification of European Mtdnas from an Analysis of Three European Populations

    PubMed Central

    Torroni, A.; Huoponen, K.; Francalacci, P.; Petrozzi, M.; Morelli, L.; Scozzari, R.; Obinu, D.; Savontaus, M. L.; Wallace, D. C.

    1996-01-01

    Mitochondrial DNA (mtDNA) sequence variation was examined in Finns, Swedes and Tuscans by PCR amplification and restriction analysis. About 99% of the mtDNAs were subsumed within 10 mtDNA haplogroups (H, I, J, K, M, T, U, V, W, and X) suggesting that the identified haplogroups could encompass virtually all European mtDNAs. Because both hypervariable segments of the mtDNA control region were previously sequenced in the Tuscan samples, the mtDNA haplogroups and control region sequences could be compared. Using a combination of haplogroup-specific restriction site changes and control region nucleotide substitutions, the distribution of the haplogroups was surveyed through the published restriction site polymorphism and control region sequence data of Caucasoids. This supported the conclusion that most haplogroups observed in Europe are Caucasoid-specific, and that at least some of them occur at varying frequencies in different Caucasoid populations. The classification of almost all European mtDNA variation in a number of well defined haplogroups could provide additional insights about the origin and relationships of Caucasoid populations and the process of human colonization of Europe, and is valuable for the definition of the role played by mtDNA backgrounds in the expression of pathological mtDNA mutations PMID:8978068

  16. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  17. Classification of Pelteobagrus fish in Poyang Lake based on mitochondrial COI gene sequence.

    PubMed

    Zhong, Bin; Chen, Ting-Ting; Gong, Rui-Yue; Zhao, Zhe-Xia; Wang, Binhua; Fang, Chunlin; Mao, Hui-Ling

    2016-11-01

    We use DNA molecular marker technology to correct the deficiency of traditional morphological taxonomy. Totality 770 Pelteobagrus fish from Poyang Lake were collected. After preliminary morphological classification, random selected eight samples in each species for DNA extraction. Mitochondrial COI gene sequence was cloned with universal primers and sequenced. The results showed that there are four species of Pelteobagrus living in Poyang Lake. The average of intraspecific genetic distance value was 0.003, while the average interspecific genetic distance was 0.128. The interspecific genetic distance is far more than intraspecific genetic distance. Besides, phylogenetic tree analysis revealed that molecular systematics was in accord with morphological classification. It indicated that COI gene is an effective DNA molecular marker in Pelteobagrus classification. Surprisingly, the intraspecific difference of some individuals (P. e6, P. n6, P. e5, and P. v4) from their original named exceeded species threshold (2%), which should be renewedly classified into Pelteobagrus fulvidraco. However, another individual P. v3 was very different, because its genetic distance was over 8.4% difference from original named Pelteobagrus vachelli. Its taxonomic status remained to be further studied.

  18. The number of reduced alignments between two DNA sequences

    PubMed Central

    2014-01-01

    Background In this study we consider DNA sequences as mathematical strings. Total and reduced alignments between two DNA sequences have been considered in the literature to measure their similarity. Results for explicit representations of some alignments have been already obtained. Results We present exact, explicit and computable formulas for the number of different possible alignments between two DNA sequences and a new formula for a class of reduced alignments. Conclusions A unified approach for a wide class of alignments between two DNA sequences has been provided. The formula is computable and, if complemented by software development, will provide a deeper insight into the theory of sequence alignment and give rise to new comparison methods. AMS Subject Classification Primary 92B05, 33C20, secondary 39A14, 65Q30 PMID:24684679

  19. Rapid identification and classification of bacteria by 16S rDNA restriction fragment melting curve analyses (RFMCA).

    PubMed

    Rudi, Knut; Kleiberg, Gro H; Heiberg, Ragnhild; Rosnes, Jan T

    2007-08-01

    The aim of this work was to evaluate restriction fragment melting curve analyses (RFMCA) as a novel approach for rapid classification of bacteria during food production. RFMCA was evaluated for bacteria isolated from sous vide food products, and raw materials used for sous vide production. We identified four major bacterial groups in the material analysed (cluster I-Streptococcus, cluster II-Carnobacterium/Bacillus, cluster III-Staphylococcus and cluster IV-Actinomycetales). The accuracy of RFMCA was evaluated by comparison with 16S rDNA sequencing. The strains satisfying the RFMCA quality filtering criteria (73%, n=57), with both 16S rDNA sequence information and RFMCA data (n=45) gave identical group assignments with the two methods. RFMCA enabled rapid and accurate classification of bacteria that is database compatible. Potential application of RFMCA in the food or pharmaceutical industry will include development of classification models for the bacteria expected in a given product, and then to build an RFMCA database as a part of the product quality control.

  20. Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

    PubMed

    Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

    2015-05-01

    Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Efficient alignment-free DNA barcode analytics.

    PubMed

    Kuksa, Pavel; Pavlovic, Vladimir

    2009-11-10

    In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.

  2. Phylogenetic relationships in three species of canine Demodex mite based on partial sequences of mitochondrial 16S rDNA.

    PubMed

    Sastre, Natalia; Ravera, Ivan; Villanueva, Sergio; Altet, Laura; Bardagí, Mar; Sánchez, Armand; Francino, Olga; Ferrer, Lluís

    2012-12-01

    The historical classification of Demodex mites has been based on their hosts and morphological features. Genome sequencing has proved to be a very effective taxonomic tool in phylogenetic studies and has been applied in the classification of Demodex. Mitochondrial 16S rDNA has been demonstrated to be an especially useful marker to establish phylogenetic relationships. To amplify and sequence a segment of the mitochondrial 16S rDNA from Demodex canis and Demodex injai, as well as from the short-bodied mite called, unofficially, D. cornei and to determine their genetic proximity. Demodex mites were examined microscopically and classified as Demodex folliculorum (one sample), D. canis (four samples), D. injai (two samples) or the short-bodied species D. cornei (three samples). DNA was extracted, and a 338 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of the four D. canis mites were identical and shared 99.6 and 97.3% identity with two D. canis sequences available at GenBank. The sequences of the D. cornei isolates were identical and showed 97.8, 98.2 and 99.6% identity with the D. canis isolates. The sequences of the two D. injai isolates were also identical and showed 76.6% identity with the D. canis sequence. Demodex canis and D. injai are two different species, with a genetic distance of 23.3%. It would seem that the short-bodied Demodex mite D. cornei is a morphological variant of D. canis. © 2012 The Authors. Veterinary Dermatology © 2012 ESVD and ACVD.

  3. The history and advances of reversible terminators used in new generations of sequencing technology.

    PubMed

    Chen, Fei; Dong, Mengxing; Ge, Meng; Zhu, Lingxiang; Ren, Lufeng; Liu, Guocheng; Mu, Rong

    2013-02-01

    DNA sequencing using reversible terminators, as one sequencing by synthesis strategy, has garnered a great deal of interest due to its popular application in the second-generation high-throughput DNA sequencing technology. In this review, we provided its history of development, classification, and working mechanism of this technology. We also outlined the screening strategies for DNA polymerases to accommodate the reversible terminators as substrates during polymerization; particularly, we introduced the "REAP" method developed by us. At the end of this review, we discussed current limitations of this approach and provided potential solutions to extend its application. Copyright © 2013. Production and hosting by Elsevier Ltd.

  4. Efficient alignment-free DNA barcode analytics

    PubMed Central

    Kuksa, Pavel; Pavlovic, Vladimir

    2009-01-01

    Background In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. Results New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Conclusion Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding. PMID:19900305

  5. Noncoding sequence classification based on wavelet transform analysis: part I

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  6. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  7. Phylogeny and classification of bacteria in the genera Clavibacter and Rathayibacter on the basis of 16s rRNA gene sequence analyses.

    PubMed

    Lee, I M; Bartoszyk, I M; Gundersen-Rindal, D E; Davis, R E

    1997-07-01

    A phylogenetic analysis by parsimony of 16S rRNA gene sequences (16S rDNA) revealed that species and subspecies of Clavibacter and Rathayibacter form a discrete monophyletic clade, paraphyletic to Corynebacterium species. Within the Clavibacter-Rathayibacter clade, four major phylogenetic groups (subclades) with a total of 10 distinct taxa were recognized: (I) species C. michiganensis; (II) species C. xyli; (III) species R. iranicus and R. tritici; and (IV) species R. rathayi. The first three groups form a monophyletic cluster, paraphyletic to R. rathayi. On the basis of the phylogeny inferred, reclassification of members of Clavibacter-Rathayibacter group is proposed. A system for classification of taxa in Clavibacter and Rathayibacter was developed based on restriction fragment length polymorphism (RFLP) analysis of the PCR-amplified 16S rDNA sequences. The groups delineated on the basis of RFLP patterns of 16S rDNA coincided well with the subclades delineated on the basis of phylogeny. In contrast to previous classification systems, which are based primarily on phenotypic properties and are laborious, the RFLP analyses allow for rapid differentiation among species and subspecies in the two genera.

  8. ANN modeling of DNA sequences: new strategies using DNA shape code.

    PubMed

    Parbhane, R V; Tambe, S S; Kulkarni, B D

    2000-09-01

    Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.

  9. New Approaches to Attenuated Hepatitis a Vaccine Development: Cloning and Sequencing of Cell-Culture Adapted Viral cDNA.

    DTIC Science & Technology

    1987-10-13

    after multiple passages in vivo and in vitro. J. Gen. Virol. 67, 1741- 1744. Sabin , A.B. (1985). Oral poliovirus vaccine : history of its development...IN (N NEW APPROACHES TO ATTENUATED HEPATITIS A VACCINE DEVELOPMENT: Q) CLONING AND SEQUENCING OF CELL-CULTURE ADAPTED VIRAL cDNA I ANNUAL REPORT...6ll02Bsl0 A 055 11. TITLE (Include Security Classification) New Approaches to Attenuated Hepatitis A Vaccine Development: Cloning and Sequencing of Cell

  10. TRAP: automated classification, quantification and annotation of tandemly repeated sequences.

    PubMed

    Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur

    2006-02-01

    TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.

  11. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  12. Mitochondrial DNA Evidence Supports the Hypothesis that Triodontophorus Species Belong to Cyathostominae

    PubMed Central

    Gao, Yuan; Zhang, Yan; Yang, Xin; Qiu, Jian-Hua; Duan, Hong; Xu, Wen-Wen; Chang, Qiao-Cheng; Wang, Chun-Ren

    2017-01-01

    Equine strongyles, the significant nematode pathogens of horses, are characterized by high quantities and species abundance, but classification of this group of parasitic nematodes is debated. Mitochondrial (mt) genome DNA data are often used to address classification controversies. Thus, the objectives of this study were to determine the complete mt genomes of three Cyathostominae nematode species (Cyathostomum catinatum, Cylicostephanus minutus, and Poteriostomum imparidentatum) of horses and reconstruct the phylogenetic relationship of Strongylidae with other nematodes in Strongyloidea to test the hypothesis that Triodontophorus spp. belong to Cyathostominae using the mt genomes. The mt genomes of Cy. catinatum, Cs. minutus, and P. imparidentatum were 13,838, 13,826, and 13,817 bp in length, respectively. Complete mt nucleotide sequence comparison of all Strongylidae nematodes revealed that sequence identity ranged from 77.8 to 91.6%. The mt genome sequences of Triodontophorus species had relatively high identity with Cyathostominae nematodes, rather than Strongylus species of the same subfamily (Strongylinae). Comparative analyses of mt genome organization for Strongyloidea nematodes sequenced to date revealed that members of this superfamily possess identical gene arrangements. Phylogenetic analyses using mtDNA data indicated that the Triodontophorus species clustered with Cyathostominae species instead of Strongylus species. The present study first determined the complete mt genome sequences of Cy. catinatum, Cs. minutus, and P. imparidentatum, which will provide novel genetic markers for further studies of Strongylidae taxonomy, population genetics, and systematics. Importantly, sequence comparison and phylogenetic analyses based on mtDNA sequences supported the hypothesis that Triodontophorus belongs to Cyathostominae. PMID:28824575

  13. Assessing Diversity of DNA Structure-Related Sequence Features in Prokaryotic Genomes

    PubMed Central

    Huang, Yongjie; Mrázek, Jan

    2014-01-01

    Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of the DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e. different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms' habitats, phylogenetic classifications, and other characteristics. Our present work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are over-represented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. Additionally, representations of close direct repeats, palindromes and inverted repeats exhibit clear negative trends with increasing OGT. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches. PMID:24408877

  14. Complementary DNA cloning and molecular evolution of opine dehydrogenases in some marine invertebrates.

    PubMed

    Kimura, Tomohiro; Nakano, Toshiki; Yamaguchi, Toshiyasu; Sato, Minoru; Ogawa, Tomohisa; Muramoto, Koji; Yokoyama, Takehiko; Kan-No, Nobuhiro; Nagahisa, Eizou; Janssen, Frank; Grieshaber, Manfred K

    2004-01-01

    The complete complementary DNA sequences of genes presumably coding for opine dehydrogenases from Arabella iricolor (sandworm), Haliotis discus hannai (abalone), and Patinopecten yessoensis (scallop) were determined, and partial cDNA sequences were derived for Meretrix lusoria (Japanese hard clam) and Spisula sachalinensis (Sakhalin surf clam). The primers ODH-9F and ODH-11R proved useful for amplifying the sequences for opine dehydrogenases from the 4 mollusk species investigated in this study. The sequence of the sandworm was obtained using primers constructed from the amino acid sequence of tauropine dehydrogenase, the main opine dehydrogenase in A. iricolor. The complete cDNA sequence of A. iricolor, H. discus hannai, and P. yessoensis encode 397, 400, and 405 amino acids, respectively. All sequences were aligned and compared with published databank sequences of Loligo opalescens, Loligo vulgaris (squid), Sepia officinalis (cuttlefish), and Pecten maximus (scallop). As expected, a high level of homology was observed for the cDNA from closely related species, such as for cephalopods or scallops, whereas cDNA from the other species showed lower-level homologies. A similar trend was observed when the deduced amino acid sequences were compared. Furthermore, alignment of these sequences revealed some structural motifs that are possibly related to the binding sites of the substrates. The phylogenetic trees derived from the nucleotide and amino acid sequences were consistent with the classification of species resulting from classical taxonomic analyses.

  15. Integrating a DNA barcoding project with an ecological survey: a case study on temperate intertidal polychaete communities in Qingdao, China

    NASA Astrophysics Data System (ADS)

    Zhou, Hong; Zhang, Zhinan; Chen, Haiyan; Sun, Renhua; Wang, Hui; Guo, Lei; Pan, Haijian

    2010-07-01

    In this study, we integrated a DNA barcoding project with an ecological survey on intertidal polychaete communities and investigated the utility of CO1 gene sequence as a DNA barcode for the classification of the intertidal polychaetes. Using 16S rDNA as a complementary marker and combining morphological and ecological characterization, some of dominant and common polychaete species from Chinese coasts were assessed for their taxonomic status. We obtained 22 haplotype gene sequences of 13 taxa, including 10 CO1 sequences and 12 16S rDNA sequences. Based on intra- and inter-specific distances, we built phylogenetic trees using the neighbor-joining method. Our study suggested that the mitochondrial CO1 gene was a valid DNA barcoding marker for species identification in polychaetes, but other genes, such as 16S rDNA, could be used as a complementary genetic marker. For more accurate species identification and effective testing of species hypothesis, DNA barcoding should be incorporated with morphological, ecological, biogeographical, and phylogenetic information. The application of DNA barcoding and molecular identification in the ecological survey on the intertidal polychaete communities demonstrated the feasibility of integrating DNA taxonomy and ecology.

  16. Relationships in subtribe Diocleinae (Leguminosae; Papilionoideae) inferred from internal transcribed spacer sequences from nuclear ribosomal DNA.

    PubMed

    Varela, Eduardo S; Lima, João P M S; Galdino, Alexsandro S; Pinto, Luciano da S; Bezerra, Walderly M; Nunes, Edson P; Alves, Maria A O; Grangeiro, Thalles B

    2004-01-01

    The complete sequences of nuclear ribosomal DNA (nrDNA) internal transcribed spacer regions (ITS/5.8S) were determined for species belonging to six genera from the subtribe Diocleinae as well as for the anomalous genera Calopogonium and Pachyrhizus. Phylogenetic trees constructed by distance matrix, maximum parsimony and maximum likelihood methods showed that Calopogonium and Pachyrhizus were outside the clade Diocleinae (Canavalia, Camptosema, Cratylia, Dioclea, Cymbosema, and Galactia). This finding supports previous morphological, phytochemical, and molecular evidence that Calopogonium and Pachyrhizus do not belong to the subtribe Diocleinae. Within the true Diocleinae clade, the clustering of genera and species were congruent with morphology-based classifications, suggesting that ITS/5.8S sequences can provide enough informative sites to allow resolution below the genus level. This is the first evidence of the phylogeny of subtribe Diocleinae based on nuclear DNA sequences.

  17. Noninvasive prenatal detection of sex chromosomal aneuploidies by sequencing circulating cell-free DNA from maternal plasma.

    PubMed

    Mazloom, Amin R; Džakula, Željko; Oeth, Paul; Wang, Huiquan; Jensen, Taylor; Tynan, John; McCullough, Ron; Saldivar, Juan-Sebastian; Ehrich, Mathias; van den Boom, Dirk; Bombard, Allan T; Maeder, Margo; McLennan, Graham; Meschino, Wendy; Palomaki, Glenn E; Canick, Jacob A; Deciu, Cosmin

    2013-06-01

    Whole-genome sequencing of circulating cell free (ccf) DNA from maternal plasma has enabled noninvasive prenatal testing for common autosomal aneuploidies. The purpose of this study was to extend the detection to include common sex chromosome aneuploidies (SCAs): [47,XXX], [45,X], [47,XXY], and [47,XYY] syndromes. Massively parallel sequencing was performed on ccf DNA isolated from the plasma of 1564 pregnant women with known fetal karyotype. A classification algorithm for SCA detection was constructed and trained on this cohort. Another study of 411 maternal samples from women with blinded-to-laboratory fetal karyotypes was then performed to determine the accuracy of the classification algorithm. In the training cohort, the new algorithm had a detection rate (DR) of 100% (95%CI: 82.3%, 100%), a false positive rate (FPR) of 0.1% (95%CI: 0%, 0.3%), and nonreportable rate of 6% (95%CI: 4.9%, 7.4%) for SCA determination. The blinded validation yielded similar results: DR of 96.2% (95%CI: 78.4%, 99.8%), FPR of 0.3% (95%CI: 0%, 1.8%), and nonreportable rate of 5% (95%CI: 3.2%, 7.7%) for SCA determination Noninvasive prenatal identification of the most common sex chromosome aneuploidies is possible using ccf DNA and massively parallel sequencing with a high DR and a low FPR. © 2013 John Wiley & Sons, Ltd.

  18. Rapid identification and classification of Mycobacterium spp. using whole-cell protein barcodes with matrix assisted laser desorption ionization time of flight mass spectrometry in comparison with multigene phylogenetic analysis.

    PubMed

    Wang, Jun; Chen, Wen Feng; Li, Qing X

    2012-02-24

    The need of quick diagnostics and increasing number of bacterial species isolated necessitate development of a rapid and effective phenotypic identification method. Mass spectrometry (MS) profiling of whole cell proteins has potential to satisfy the requirements. The genus Mycobacterium contains more than 154 species that are taxonomically very close and require use of multiple genes including 16S rDNA for phylogenetic identification and classification. Six strains of five Mycobacterium species were selected as model bacteria in the present study because of their 16S rDNA similarity (98.4-99.8%) and the high similarity of the concatenated 16S rDNA, rpoB and hsp65 gene sequences (95.9-99.9%), requiring high identification resolution. The classification of the six strains by MALDI TOF MS protein barcodes was consistent with, but at much higher resolution than, that of the multi-locus sequence analysis of using 16S rDNA, rpoB and hsp65. The species were well differentiated using MALDI TOF MS and MALDI BioTyper™ software after quick preparation of whole-cell proteins. Several proteins were selected as diagnostic markers for species confirmation. An integration of MALDI TOF MS, MALDI BioTyper™ software and diagnostic protein fragments provides a robust phenotypic approach for bacterial identification and classification. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. A new taxonomic classification for species in Gomphus sensu lato

    Treesearch

    Admir J. Giachini; Michael A. Castellano

    2011-01-01

    Taxonomy of the Gomphales has been revisited by combining morphology and molecular data (DNA sequences) to provide a natural classification for the species of Gomphus sensu lato. Results indicate Gomphus s.l. to be non-monophyletic, leading to new combinations and the placement of its species into four genera: Gomphus...

  20. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.

    PubMed

    Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar

    2017-01-01

    DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  1. Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk Prediction and Cancer Classification

    PubMed Central

    Kamps, Rick; Brandão, Rita D.; van den Bosch, Bianca J.; Paulussen, Aimee D. C.; Xanthoulea, Sofia; Blok, Marinus J.; Romano, Andrea

    2017-01-01

    Next-generation sequencing (NGS) technology has expanded in the last decades with significant improvements in the reliability, sequencing chemistry, pipeline analyses, data interpretation and costs. Such advances make the use of NGS feasible in clinical practice today. This review describes the recent technological developments in NGS applied to the field of oncology. A number of clinical applications are reviewed, i.e., mutation detection in inherited cancer syndromes based on DNA-sequencing, detection of spliceogenic variants based on RNA-sequencing, DNA-sequencing to identify risk modifiers and application for pre-implantation genetic diagnosis, cancer somatic mutation analysis, pharmacogenetics and liquid biopsy. Conclusive remarks, clinical limitations, implications and ethical considerations that relate to the different applications are provided. PMID:28146134

  2. Noncoding sequence classification based on wavelet transform analysis: part II

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez-Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. We hypothesize that the characteristic periodicities of the noncoding sequences are related to their function. We describe the procedure to identify these characteristic periodicities using the wavelet analysis. Our results show that three groups of noncoding sequences, each one with different biological function, may be differentiated by their wavelet coefficients within specific frequency range.

  3. Centrifuge: rapid and sensitive classification of metagenomic sequences

    PubMed Central

    Song, Li; Breitwieser, Florian P.

    2016-01-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. PMID:27852649

  4. Quantitative analysis and prediction of G-quadruplex forming sequences in double-stranded DNA

    PubMed Central

    Kim, Minji; Kreig, Alex; Lee, Chun-Ying; Rube, H. Tomas; Calvert, Jacob; Song, Jun S.; Myong, Sua

    2016-01-01

    Abstract G-quadruplex (GQ) is a four-stranded DNA structure that can be formed in guanine-rich sequences. GQ structures have been proposed to regulate diverse biological processes including transcription, replication, translation and telomere maintenance. Recent studies have demonstrated the existence of GQ DNA in live mammalian cells and a significant number of potential GQ forming sequences in the human genome. We present a systematic and quantitative analysis of GQ folding propensity on a large set of 438 GQ forming sequences in double-stranded DNA by integrating fluorescence measurement, single-molecule imaging and computational modeling. We find that short minimum loop length and the thymine base are two main factors that lead to high GQ folding propensity. Linear and Gaussian process regression models further validate that the GQ folding potential can be predicted with high accuracy based on the loop length distribution and the nucleotide content of the loop sequences. Our study provides important new parameters that can inform the evaluation and classification of putative GQ sequences in the human genome. PMID:27095201

  5. Classification of DNA nucleotides with transverse tunneling currents

    NASA Astrophysics Data System (ADS)

    Nyvold Pedersen, Jonas; Boynton, Paul; Di Ventra, Massimiliano; Jauho, Antti-Pekka; Flyvbjerg, Henrik

    2017-01-01

    It has been theoretically suggested and experimentally demonstrated that fast and low-cost sequencing of DNA, RNA, and peptide molecules might be achieved by passing such molecules between electrodes embedded in a nanochannel. The experimental realization of this scheme faces major challenges, however. In realistic liquid environments, typical currents in tunneling devices are of the order of picoamps. This corresponds to only six electrons per microsecond, and this number affects the integration time required to do current measurements in real experiments. This limits the speed of sequencing, though current fluctuations due to Brownian motion of the molecule average out during the required integration time. Moreover, data acquisition equipment introduces noise, and electronic filters create correlations in time-series data. We discuss how these effects must be included in the analysis of, e.g., the assignment of specific nucleobases to current signals. As the signals from different molecules overlap, unambiguous classification is impossible with a single measurement. We argue that the assignment of molecules to a signal is a standard pattern classification problem and calculation of the error rates is straightforward. The ideas presented here can be extended to other sequencing approaches of current interest.

  6. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers.

    PubMed

    Siew, Ging Yang; Ng, Wei Lun; Tan, Sheau Wei; Alitheen, Noorjahan Banu; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian ( Durio zibethinus ) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, H E  = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10 -3 . Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called "clones", "varieties", or "cultivars". Such matters have a direct impact on the regulation and management of durian genetic resources in the region.

  7. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers

    PubMed Central

    Siew, Ging Yang; Tan, Sheau Wei; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian (Durio zibethinus) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, HE = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10−3. Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called “clones”, “varieties”, or “cultivars”. Such matters have a direct impact on the regulation and management of durian genetic resources in the region. PMID:29511604

  8. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less

  9. A subgeneric classification of Selaginella (Selaginellaceae).

    PubMed

    Weststrand, Stina; Korall, Petra

    2016-12-01

    The lycophyte family Selaginellaceae includes approximately 750 herbaceous species worldwide, with the main species richness in the tropics and subtropics. We recently presented a phylogenetic analysis of Selaginellaceae based on DNA sequence data and, with the phylogeny as a framework, the study discussed the character evolution of the group focusing on gross morphology. Here we translate these findings into a new classification. To present a robust and useful classification, we identified well-supported monophyletic groups from our previous phylogenetic analysis of 223 species, which together represent the diversity of the family with respect to morphology, taxonomy, and geographical distribution. Care was taken to choose groups with supporting morphology. In this classification, we recognize a single genus Selaginella and seven subgenera: Selaginella, Rupestrae, Lepidophyllae, Gymnogynum, Exaltatae, Ericetorum, and Stachygynandrum. The subgenera are all well supported based on analysis of DNA sequence data and morphology. A key to the subgenera is presented. Our new classification is based on a well-founded hypothesis of the evolutionary relationships of Selaginella, and each subgenus can be identified by a suite of morphological features, most of them possible to study in the field. Our intention is that the classification will be useful not only to experts in the field, but also to a broader audience. © 2016 Weststrand and Korall. Published by the Botanical Society of America. This work is licensed under a Creative Commons Attribution License (CC-BY 4.0).

  10. Mapping the Space of Genomic Signatures

    PubMed Central

    Kari, Lila; Hill, Kathleen A.; Sayem, Abu S.; Karamichalis, Rallis; Bryans, Nathaniel; Davis, Katelyn; Dattani, Nikesh S.

    2015-01-01

    We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber. PMID:26000734

  11. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

    PubMed

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-05-01

    Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  12. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  13. Chloroplast variation is incongruent with classification of the Australian bloodwood eucalypts (genus Corymbia, family Myrtaceae)

    PubMed Central

    Schuster, Tanja M.; Setaro, Sabrina D.; Tibbits, Josquin F. G.; Batty, Erin L.; Fowler, Rachael M.; McLay, Todd G. B.; Wilcox, Stephen; Ades, Peter K.

    2018-01-01

    Previous molecular phylogenetic analyses have resolved the Australian bloodwood eucalypt genus Corymbia (~100 species) as either monophyletic or paraphyletic with respect to Angophora (9–10 species). Here we assess relationships of Corymbia and Angophora using a large dataset of chloroplast DNA sequences (121,016 base pairs; from 90 accessions representing 55 Corymbia and 8 Angophora species, plus 33 accessions of related genera), skimmed from high throughput sequencing of genomic DNA, and compare results with new analyses of nuclear ITS sequences (119 accessions) from previous studies. Maximum likelihood and maximum parsimony analyses of cpDNA resolve well supported trees with most nodes having >95% bootstrap support. These trees strongly reject monophyly of Corymbia, its two subgenera (Corymbia and Blakella), most taxonomic sections (Abbreviatae, Maculatae, Naviculares, Septentrionales), and several species. ITS trees weakly indicate paraphyly of Corymbia (bootstrap support <50% for maximum likelihood, and 71% for parsimony), but are highly incongruent with the cpDNA analyses, in that they support monophyly of both subgenera and some taxonomic sections of Corymbia. The striking incongruence between cpDNA trees and both morphological taxonomy and ITS trees is attributed largely to chloroplast introgression between taxa, because of geographic sharing of chloroplast clades across taxonomic groups. Such introgression has been widely inferred in studies of the related genus Eucalyptus. This is the first report of its likely prevalence in Corymbia and Angophora, but this is consistent with previous morphological inferences of hybridisation between species. Our findings (based on continent-wide sampling) highlight a need for more focussed studies to assess the extent of hybridisation and introgression in the evolutionary history of these genera, and that critical testing of the classification of Corymbia and Angophora requires additional sequence data from nuclear genomes. PMID:29668710

  14. Advances in yeast systematics and phylogeny and their use as predictors of biotechnologically important metabolic pathways

    USDA-ARS?s Scientific Manuscript database

    Detection, identification, and classification of yeasts have undergone a major transformation in the last decade and a half following application of gene sequence analyses and genome comparisons. Development of a database (barcode) of easily determined DNA sequences from domains 1 and 2 (D1/D2) of t...

  15. Fascioliasis transmission by Lymnaea neotropica confirmed by nuclear rDNA and mtDNA sequencing in Argentina.

    PubMed

    Mera y Sierra, Roberto; Artigas, Patricio; Cuervo, Pablo; Deis, Erika; Sidoti, Laura; Mas-Coma, Santiago; Bargues, Maria Dolores

    2009-12-03

    Fascioliasis is widespread in livestock in Argentina. Among activities included in a long-term initiative to ascertain which are the fascioliasis areas of most concern, studies were performed in a recreational farm, including liver fluke infection in different domestic animal species, classification of the lymnaeid vector and verification of natural transmission of fascioliasis by identification of the intramolluscan trematode larval stages found in naturally infected snails. The high prevalences in the domestic animals appeared related to only one lymnaeid species present. Lymnaeid and trematode classification was verified by means of nuclear ribosomal DNA and mitochondrial DNA marker sequencing. Complete sequences of 18S rRNA gene and rDNA ITS-2 and ITS-1, and a fragment of the mtDNA cox1 gene demonstrate that the Argentinian lymnaeid belongs to the species Lymnaea neotropica. Redial larval stages found in a L. neotropica specimen were ascribed to Fasciola hepatica after analysis of the complete ITS-1 sequence. The finding of L. neotropica is the first of this lymnaeid species not only in Argentina but also in Southern Cone countries. The total absence of nucleotide differences between the sequences of specimens from Argentina and the specimens from the Peruvian type locality at the levels of rDNA 18S, ITS-2 and ITS-1, and the only one mutation at the mtDNA cox1 gene suggest a very recent spread. The ecological characteristics of this lymnaeid, living in small, superficial water collections frequented by livestock, suggest that it may be carried from one place to another by remaining in dried mud stuck to the feet of transported animals. The presence of L. neotropica adds pronounced complexity to the transmission and epidemiology of fascioliasis in Argentina, due to the great difficulties in distinguishing, by traditional malacological methods, between the three similar lymnaeid species of the controversial Galba/Fossaria group present in this country: L. viatrix, Galba truncatula and L. neotropica. It also poses a problem with regard to the use, for lymnaeid vector species discrimination, of several molecular techniques which do not show sufficient accuracy, as those relying on the 18S rRNA gene or parts of it, because both L. neotropica and L. viatrix present identical 18S sequence.

  16. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  17. Characterization of the Complete Mitochondrial Genome Sequence of Spirometra erinaceieuropaei (Cestoda: Diphyllobothriidae) from China

    PubMed Central

    Liu, Guo-Hua; Li, Chun; Li, Jia-Yuan; Zhou, Dong-Hui; Xiong, Rong-Chuan; Lin, Rui-Qing; Zou, Feng-Cai; Zhu, Xing-Quan

    2012-01-01

    Sparganosis, caused by the plerocercoid larvae of members of the genus Spirometra, can cause significant public health problem and considerable economic losses. In the present study, the complete mitochondrial DNA (mtDNA) sequence of Spirometra erinaceieuropaei from China was determined, characterized and compared with that of S. erinaceieuropaei from Japan. The gene arrangement in the mt genome sequences of S. erinaceieuropaei from China and Japan is identical. The identity of the mt genomes was 99.1% between S. erinaceieuropaei from China and Japan, and the complete mtDNA sequence of S. erinaceieuropaei from China is slightly shorter (2 bp) than that from Japan. Phylogenetic analysis of S. erinaceieuropaei with other representative cestodes using two different computational algorithms [Bayesian inference (BI) and maximum likelihood (ML)] based on concatenated amino acid sequences of 12 protein-coding genes, revealed that S. erinaceieuropaei is closely related to Diphyllobothrium spp., supporting classification based on morphological features. The present study determined the complete mtDNA sequences of S. erinaceieuropaei from China that provides novel genetic markers for studying the population genetics and molecular epidemiology of S. erinaceieuropaei in humans and animals. PMID:22553464

  18. Centrifuge: rapid and sensitive classification of metagenomic sequences.

    PubMed

    Kim, Daehwan; Song, Li; Breitwieser, Florian P; Salzberg, Steven L

    2016-12-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. © 2016 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

    PubMed

    Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

    2014-01-01

    A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.

  20. Design and characterization of a nanopore-coupled polymerase for single-molecule DNA sequencing by synthesis on an electrode array

    PubMed Central

    Stranges, P. Benjamin; Palla, Mirkó; Kalachikov, Sergey; Nivala, Jeff; Dorwart, Michael; Trans, Andrew; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Tao, Chuanjuan; Morozova, Irina; Li, Zengmin; Shi, Shundi; Aberra, Aman; Arnold, Cleoma; Yang, Alexander; Aguirre, Anne; Harada, Eric T.; Korenblum, Daniel; Pollard, James; Bhat, Ashwini; Gremyachinskiy, Dmitriy; Bibillo, Arek; Chen, Roger; Davis, Randy; Russo, James J.; Fuller, Carl W.; Roever, Stefan; Ju, Jingyue; Church, George M.

    2016-01-01

    Scalable, high-throughput DNA sequencing is a prerequisite for precision medicine and biomedical research. Recently, we presented a nanopore-based sequencing-by-synthesis (Nanopore-SBS) approach, which used a set of nucleotides with polymer tags that allow discrimination of the nucleotides in a biological nanopore. Here, we designed and covalently coupled a DNA polymerase to an α-hemolysin (αHL) heptamer using the SpyCatcher/SpyTag conjugation approach. These porin–polymerase conjugates were inserted into lipid bilayers on a complementary metal oxide semiconductor (CMOS)-based electrode array for high-throughput electrical recording of DNA synthesis. The designed nanopore construct successfully detected the capture of tagged nucleotides complementary to a DNA base on a provided template. We measured over 200 tagged-nucleotide signals for each of the four bases and developed a classification method to uniquely distinguish them from each other and background signals. The probability of falsely identifying a background event as a true capture event was less than 1.2%. In the presence of all four tagged nucleotides, we observed sequential additions in real time during polymerase-catalyzed DNA synthesis. Single-polymerase coupling to a nanopore, in combination with the Nanopore-SBS approach, can provide the foundation for a low-cost, single-molecule, electronic DNA-sequencing platform. PMID:27729524

  1. A Simple Method for the Extraction, PCR-amplification, Cloning, and Sequencing of Pasteuria 16S rDNA from Small Numbers of Endospores.

    PubMed

    Atibalentja, N; Noel, G R; Ciancio, A

    2004-03-01

    For many years the taxonomy of the genus Pasteuria has been marred with confusion because the bacterium could not be cultured in vitro and, therefore, descriptions were based solely on morphological, developmental, and pathological characteristics. The current study sought to devise a simple method for PCR-amplification, cloning, and sequencing of Pasteuria 16S rDNA from small numbers of endospores, with no need for prior DNA purification. Results show that DNA extracts from plain glass bead-beating of crude suspensions containing 10,000 endospores at 0.2 x 10 endospores ml(-1) were sufficient for PCR-amplification of Pasteuria 16S rDNA, when used in conjunction with specific primers. These results imply that for P. penetrans and P. nishizawae only one parasitized female of Meloidogyne spp. and Heterodera glycines, respectively, should be sufficient, and as few as eight cadavers of Belonolaimus longicaudatus with an average number of 1,250 endospores of "Candidatus Pasteuria usgae" are needed for PCR-amplification of Pasteuria 16S rDNA. The method described in this paper should facilitate the sequencing of the 16S rDNA of the many Pasteuria isolates that have been reported on nematodes and, consequently, expedite the classification of those isolates through comparative sequence analysis.

  2. CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA.

    PubMed

    Kang, Shuli; Li, Qingjiao; Chen, Quan; Zhou, Yonggang; Park, Stacy; Lee, Gina; Grimes, Brandon; Krysan, Kostyantyn; Yu, Min; Wang, Wei; Alber, Frank; Sun, Fengzhu; Dubinett, Steven M; Li, Wenyuan; Zhou, Xianghong Jasmine

    2017-03-24

    We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples with low DNA methylation sequencing coverage.

  3. GENE-07. MOLECULAR NEUROPATHOLOGY 2.0 - INCREASING DIAGNOSTIC ACCURACY IN PEDIATRIC NEUROONCOLOGY

    PubMed Central

    Sturm, Dominik; Jones, David T.W.; Capper, David; Sahm, Felix; von Deimling, Andreas; Rutkoswki, Stefan; Warmuth-Metz, Monika; Bison, Brigitte; Gessi, Marco; Pietsch, Torsten; Pfister, Stefan M.

    2017-01-01

    Abstract The classification of central nervous system (CNS) tumors into clinically and biologically distinct entities and subgroups is challenging. Children and adolescents can be affected by >100 histological variants with very variable outcomes, some of which are exceedingly rare. The current WHO classification has introduced a number of novel molecular markers to aid routine neuropathological diagnostics, and DNA methylation profiling is emerging as a powerful tool to distinguish CNS tumor classes. The Molecular Neuropathology 2.0 study aims to integrate genome wide (epi-)genetic diagnostics with reference neuropathological assessment for all newly-diagnosed pediatric brain tumors in Germany. To date, >350 patients have been enrolled. A molecular diagnosis is established by epigenetic tumor classification through DNA methylation profiling and targeted panel sequencing of >130 genes to detect diagnostically and/or therapeutically useful DNA mutations, structural alterations, and fusion events. Results are aligned with the reference neuropathological diagnosis, and discrepant findings are discussed in a multi-disciplinary tumor board including reference neuroradiological evaluation. Ten FFPE sections as input material are sufficient to establish a molecular diagnosis in >95% of tumors. Alignment with reference pathology results in four broad categories: a) concordant classification (~77%), b) discrepant classification resolvable by tumor board discussion and/or additional data (~5%), c) discrepant classification without currently available options to resolve (~8%), and d) cases currently unclassifiable by molecular diagnostics (~10%). Discrepancies are enriched in certain histopathological entities, such as histological high grade gliomas with a molecularly low grade profile. Gene panel sequencing reveals predisposing germline events in ~10% of patients. Genome wide (epi-)genetic analyses add a valuable layer of information to routine neuropathological diagnostics. Our study provides insight into CNS tumors with divergent histopathological and molecular classification, opening new avenues for research discoveries and facilitating optimization of clinical management for affected patients in the future.

  4. Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steven, Blaire; Hesse, Cedar; Soghigian, John

    The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, withmore » misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate.« less

  5. Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant

    DOE PAGES

    Steven, Blaire; Hesse, Cedar; Soghigian, John; ...

    2017-03-31

    The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, withmore » misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate.« less

  6. Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel.

    PubMed

    Meadows, J R S; Hiendleder, S; Kijas, J W

    2011-04-01

    Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920,000 ± 190,000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA.

  7. Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel

    PubMed Central

    Meadows, J R S; Hiendleder, S; Kijas, J W

    2011-01-01

    Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920 000±190 000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA. PMID:20940734

  8. A phylogenetic analysis of Diurideae (Orchidaceae) based on plastid DNA sequence data.

    PubMed

    Kores, P J; Molvray, M; Weston, P H; Hopper, S D; Brown, A P; Cameron, K M; Chase, M W

    2001-10-01

    DNA sequence data from plastid matK and trnL-F regions were used in phylogenetic analyses of Diurideae, which indicate that Diurideae are not monophyletic as currently delimited. However, if Chloraeinae and Pterostylidinae are excluded from Diurideae, the remaining subtribes form a well-supported, monophyletic group that is sister to a "spiranthid" clade. Chloraea, Gavilea, and Megastylis pro parte (Chloraeinae) are all placed among the spiranthid orchids and form a grade with Pterostylis leading to a monophyletic Cranichideae. Codonorchis, previously included among Chloraeinae, is sister to Orchideae. Within the more narrowly delimited Diurideae two major lineages are apparent. One includes Diuridinae, Cryptostylidinae, Thelymitrinae, and an expanded Drakaeinae; the other includes Caladeniinae s.s., Prasophyllinae, and Acianthinae. The achlorophyllous subtribe Rhizanthellinae is a member of Diurideae, but its placement is otherwise uncertain. The sequence-based trees indicate that some morphological characters used in previous classifications, such as subterranean storage organs, anther position, growth habit, fungal symbionts, and pollination syndromes have more complex evolutionary histories than previously hypothesized. Treatments based upon these characters have produced conflicting classifications, and molecular data offer a tool for reevaluating these phylogenetic hypotheses.

  9. Genomic characterisation of the feline sarcoid-associated papillomavirus and proposed classification as Bos taurus papillomavirus type 14.

    PubMed

    Munday, John S; Thomson, Neroli; Dunowska, Magda; Knight, Cameron G; Laurie, Rebecca E; Hills, Simon

    2015-06-12

    Feline sarcoids are rare mesenchymal neoplasms of domestic and exotic cats. Previous studies have consistently detected short DNA sequences from a papillomavirus (PV), designated feline sarcoid-associated papillomavirus (FeSarPV), in these neoplasms. The FeSarPV sequence has never been detected in any non-sarcoid sample from cats but has been amplified from the skin of cattle suggesting that feline sarcoids are caused by cross-species infection by a bovine papillomavirus (BPV). The aim of the present study was to determine the genome of the PV that contains the FeSarPV sequence. Using the circular nature of PV DNA, four specifically designed 'outward facing' primers were used to amplify two approximately 4,000 bp DNA segments from a feline sarcoid. The two PCR products were sequenced using next generation sequencing and the full genome of the PV, consisting 7,966 bp, was assembled and analysed. Phylogenetic analysis revealed the PV was closely related to the species 4 delta BPVs-1, -2, and -13, but distantly related to any carnivoran PV genus. These results are consistent with feline sarcoids being caused by a BPV type and we propose a classification of BPV-14 for this novel PV. Initial analysis suggests that, like other delta BPVs, the BPV-14 E5 protein could cause mesenchymal proliferation by binding to the platelet derived growth factor beta receptor. Interestingly BPV-14 has not been detected in any equine sarcoid suggesting that BPV-14 has a host range that is limited to bovids and felids. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. New progress in snake mitochondrial gene rearrangement.

    PubMed

    Chen, Nian; Zhao, Shujin

    2009-08-01

    To further understand the evolution of snake mitochondrial genomes, the complete mitochondrial DNA (mtDNA) sequences were determined for representative species from two snake families: the Many-banded krait, the Banded krait, the Chinese cobra, the King cobra, the Hundred-pace viper, the Short-tailed mamushi, and the Chain viper. Thirteen protein-coding genes, 22-23 tRNA genes, 2 rRNA genes, and 2 control regions were identified in these mtDNAs. Duplication of the control region and translocation of the tRNAPro gene were two notable features of the snake mtDNAs. These results from the gene rearrangement comparisons confirm the correctness of traditional classification schemes and validate the utility of comparing complete mtDNA sequences for snake phylogeny reconstruction.

  11. Evolution of helotialean fungi (Leotiomycetes, Pezizomycotina): a nuclear rDNA phylogeny.

    PubMed

    Wang, Zheng; Binder, Manfred; Schoch, Conrad L; Johnston, Peter R; Spatafora, Joseph W; Hibbett, David S

    2006-11-01

    The highly divergent characters of morphology, ecology, and biology in the Helotiales make it one of the most problematic groups in traditional classification and molecular phylogeny. Sequences of three rDNA regions, SSU, LSU, and 5.8S rDNA, were generated for 50 helotialean fungi, representing 11 out of 13 families in the current classification. Data sets with different compositions were assembled, and parsimony and Bayesian analyses were performed. The phylogenetic distribution of lifestyle and ecological factors was assessed. Plant endophytism is distributed across multiple clades in the Leotiomycetes. Our results suggest that (1) the inclusion of LSU rDNA and a wider taxon sampling greatly improves resolution of the Helotiales phylogeny, however, the usefulness of rDNA in resolving the deep relationships within the Leotiomycetes is limited; (2) a new class Geoglossomycetes, including Geoglossum, Trichoglossum, and Sarcoleotia, is the basal lineage of the Leotiomyceta; (3) the Leotiomycetes, including the Helotiales, Erysiphales, Cyttariales, Rhytismatales, and Myxotrichaceae, is monophyletic; and (4) nine clades can be recognized within the Helotiales.

  12. Kraken: ultrafast metagenomic sequence classification using exact alignments

    PubMed Central

    2014-01-01

    Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. PMID:24580807

  13. [Phylogenetic analysis of closely related Leuconostoc citreum species based on partial housekeeping genes].

    PubMed

    Lv, Qiang; Chen, Ming; Xu, Haiyan; Song, Yuqin; Sun, Zhihong; Dan, Tong; Sun, Tiansong

    2013-07-04

    Using the 16S rRNA, dnaA, murC and pyrG gene sequences, we identified the phylogenetic relationship among closely related Leuconostoc citreum species. Seven Leu. citreum strains originally isolated from sourdough were characterized by PCR methods to amplify the dnaA, murC and pyrG gene sequences, which were determined to assess the suitability as phylogenetic markers. Then, we estimated the genetic distance and constructed the phylogenetic trees including 16S rRNA and above mentioned three housekeeping genes combining with published corresponding sequences. By comparing the phylogenetic trees, the topology of three housekeeping genes trees were consistent with that of 16S rRNA gene. The homology of closely related Leu. citreum species among dnaA, murC, pyrG and 16S rRNA gene sequences were different, ranged from75.5% to 97.2%, 50.2% to 99.7%, 65.0% to 99.8% and 98.5% 100%, respectively. The phylogenetic relationship of three housekeeping genes sequences were highly consistent with the results of 16S rRNA gene sequence, while the genetic distance of these housekeeping genes were extremely high than 16S rRNA gene. Consequently, the dnaA, murC and pyrG gene are suitable for classification and identification closely related Leu. citreum species.

  14. DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

    PubMed

    Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine

    2009-11-10

    DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.

  15. Determining the Location of DNA Modification and Mutation Caused by UVB Light in Skin Cancer

    DTIC Science & Technology

    2013-09-01

    we obtain cleavage patterns consistent with the administered UV dosage and that sequencing libraries generated for both yeast and human cells show...understanding the mutations they cause. 15. SUBJECT TERMS UV DNA modification, HeLa cells, Skin Cancer 16. SECURITY CLASSIFICATION OF: 17...of mutations that are caused by UV light in cells and correlate them to modification frequencies. Understanding the initial chemical changes

  16. A survey of transposable element classification systems--a call for a fundamental update to meet the challenge of their diversity and complexity.

    PubMed

    Piégu, Benoît; Bire, Solenne; Arensburger, Peter; Bigot, Yves

    2015-05-01

    The increase of publicly available sequencing data has allowed for rapid progress in our understanding of genome composition. As new information becomes available we should constantly be updating and reanalyzing existing and newly acquired data. In this report we focus on transposable elements (TEs) which make up a significant portion of nearly all sequenced genomes. Our ability to accurately identify and classify these sequences is critical to understanding their impact on host genomes. At the same time, as we demonstrate in this report, problems with existing classification schemes have led to significant misunderstandings of the evolution of both TE sequences and their host genomes. In a pioneering publication Finnegan (1989) proposed classifying all TE sequences into two classes based on transposition mechanisms and structural features: the retrotransposons (class I) and the DNA transposons (class II). We have retraced how ideas regarding TE classification and annotation in both prokaryotic and eukaryotic scientific communities have changed over time. This has led us to observe that: (1) a number of TEs have convergent structural features and/or transposition mechanisms that have led to misleading conclusions regarding their classification, (2) the evolution of TEs is similar to that of viruses by having several unrelated origins, (3) there might be at least 8 classes and 12 orders of TEs including 10 novel orders. In an effort to address these classification issues we propose: (1) the outline of a universal TE classification, (2) a set of methods and classification rules that could be used by all scientific communities involved in the study of TEs, and (3) a 5-year schedule for the establishment of an International Committee for Taxonomy of Transposable Elements (ICTTE). Copyright © 2015 Elsevier Inc. All rights reserved.

  17. PlantTribes: a gene and gene family resource for comparative genomics in plants

    PubMed Central

    Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

    2008-01-01

    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study. PMID:18073194

  18. Cloning, in Vitro expression, and novel phylogenetic classification of a channel catfish estrogen receptor

    USGS Publications Warehouse

    Xia, Z.; Patino, R.; Gale, W.L.; Maule, A.G.; Densmore, L.D.

    1999-01-01

    We obtained two channel catfish estrogen receptor (ccER) cDNA from liver of female fish using RT–PCR. The two fragments were identical in sequence except that the smaller one had an out-of-frame deletion in the E domain, suggesting the existence of ccER splice variants. The larger fragment was used to screen a cDNA library from liver of a prepubescent female. A cDNA was obtained that encoded a 581-amino-acid ER with a deduced molecular weight of 63.8 kDa. Extracts of COS-7 cells transfected with ccER cDNA bound estrogen with high affinity (Kd = 4.7 nM) and specificity. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of ccER on the basis of 18 full-length ER sequences. The tree suggested the existence of two major ER branches. One branch contained two clearly divergent clades which included all piscine ER (except Japanese eel ER) and all tetrapod ERα, respectively. The second major branch contained the eel ER and the mammalian ERβ. The high degree of divergence between the eel ER and mammalian ERβ suggested that they also represent distinct piscine and tetrapod ER. These data suggest that ERα and ERβ are present throughout vertebrates and that these two major ER types evolved by duplication of an ancestral ER gene. Sequence alignments with other members of the nuclear hormone receptor superfamily indicated the presence of 8 amino acids in the E domain that align exclusively among ER. Four of these amino acids have not received prior research attention and their function is unknown. The novel finding of putative ER splice variants in a nonmammalian vertebrate and the novel phylogenetic classification of ER offer new perspectives in understanding the diversification and function of ER.

  19. MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences.

    PubMed

    Horton, Matthew; Bodenhausen, Natacha; Bergelson, Joy

    2010-02-15

    We have created a suite of Java-based software to better provide taxonomic assignments to DNA sequences. We anticipate that the program will be useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate's query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, ...) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.

  20. Inferring Phylogenetic Relationships of Indian Citron (Citrus medica L.) based on rbcL and matK Sequences of Chloroplast DNA.

    PubMed

    Uchoi, Ajit; Malik, Surendra Kumar; Choudhary, Ravish; Kumar, Susheel; Rohini, M R; Pal, Digvender; Ercisli, Sezai; Chaudhury, Rekha

    2016-06-01

    Phylogenetic relationships of Indian Citron (Citrus medica L.) with other important Citrus species have been inferred through sequence analyses of rbcL and matK gene region of chloroplast DNA. The study was based on 23 accessions of Citrus genotypes representing 15 taxa of Indian Citrus, collected from wild, semi-wild, and domesticated stocks. The phylogeny was inferred using the maximum parsimony (MP) and neighbor-joining (NJ) methods. Both MP and NJ trees separated all the 23 accessions of Citrus into five distinct clusters. The chloroplast DNA (cpDNA) analysis based on rbcL and matK sequence data carried out in Indian taxa of Citrus was useful in differentiating all the true species and species/varieties of probable hybrid origin in distinct clusters or groups. Sequence analysis based on rbcL and matK gene provided unambiguous identification and disposition of true species like C. maxima, C. medica, C. reticulata, and related hybrids/cultivars. The separation of C. maxima, C. medica, and C. reticulata in distinct clusters or sub-clusters supports their distinctiveness as the basic species of edible Citrus. However, the cpDNA sequence analysis of rbcL and matK gene could not find any clear cut differentiation between subgenera Citrus and Papeda as proposed in Swingle's system of classification.

  1. Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

    PubMed

    Kovács, Endre R; Benko, Mária

    2009-03-01

    Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.

  2. Pantoea hericii sp. nov., Isolated from the Fruiting Bodies of Hericium erinaceus.

    PubMed

    Rong, Chengbo; Ma, Yuanwei; Wang, Shouxian; Liu, Yu; Chen, Sanfeng; Huang, Bin; Wang, Jing; Xu, Feng

    2016-06-01

    Three Gram-negative, facultatively anaerobic bacterial isolates were obtained from the fruiting bodies of the edible mushroom Hericium erinaceus showing symptoms of soft rot disease in Beijing, China. Sequences of partial 16S rRNA gene placed these isolates in the genus Pantoea. Multilocus sequence analysis based on the partial sequences of atpD, gyrB, infB and rpoB revealed P. eucalypti and P. anthophila as their closest phylogenetic relatives and indicated that these isolates constituted a possible novel species. DNA-DNA hybridization studies confirmed the classification of these isolates as a novel species and phenotypic tests allowed for differentiation from the closest phylogenetic neighbours. The name Pantoea hericii sp. nov. [Type strain LMG 28847(T) = CGMCC 1.15224(T) = JZB 2120024(T)] is proposed.

  3. Effective Feature Selection for Classification of Promoter Sequences.

    PubMed

    K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

    2016-01-01

    Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  4. An accurate bacterial DNA quantification assay for HTS library preparation of human biological samples.

    PubMed

    Seashols-Williams, Sarah; Green, Raquel; Wohlfahrt, Denise; Brand, Angela; Tan-Torres, Antonio Limjuco; Nogales, Francy; Brooks, J Paul; Singh, Baneshwar

    2018-05-17

    Sequencing and classification of microbial taxa within forensically relevant biological fluids has the potential for applications in the forensic science and biomedical fields. The quantity of bacterial DNA from human samples is currently estimated based on quantity of total DNA isolated. This method can miscalculate bacterial DNA quantity due to the mixed nature of the sample, and consequently library preparation is often unreliable. We developed an assay that can accurately and specifically quantify bacterial DNA within a mixed sample for reliable 16S ribosomal DNA (16S rDNA) library preparation and high throughput sequencing (HTS). A qPCR method was optimized using universal 16S rDNA primers, and a commercially available bacterial community DNA standard was used to develop a precise standard curve. Following qPCR optimization, 16S rDNA libraries from saliva, vaginal and menstrual secretions, urine, and fecal matter were amplified and evaluated at various DNA concentrations; successful HTS data were generated with as low as 20 pg of bacterial DNA. Changes in bacterial DNA quantity did not impact observed relative abundances of major bacterial taxa, but relative abundance changes of minor taxa were observed. Accurate quantification of microbial DNA resulted in consistent, successful library preparations for HTS analysis. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. DNA methylation-based classification and grading system for meningioma: a multicentre, retrospective analysis.

    PubMed

    Sahm, Felix; Schrimpf, Daniel; Stichel, Damian; Jones, David T W; Hielscher, Thomas; Schefzyk, Sebastian; Okonechnikov, Konstantin; Koelsche, Christian; Reuss, David E; Capper, David; Sturm, Dominik; Wirsching, Hans-Georg; Berghoff, Anna Sophie; Baumgarten, Peter; Kratz, Annekathrin; Huang, Kristin; Wefers, Annika K; Hovestadt, Volker; Sill, Martin; Ellis, Hayley P; Kurian, Kathreena M; Okuducu, Ali Fuat; Jungk, Christine; Drueschler, Katharina; Schick, Matthias; Bewerunge-Hudler, Melanie; Mawrin, Christian; Seiz-Rosenhagen, Marcel; Ketter, Ralf; Simon, Matthias; Westphal, Manfred; Lamszus, Katrin; Becker, Albert; Koch, Arend; Schittenhelm, Jens; Rushing, Elisabeth J; Collins, V Peter; Brehmer, Stefanie; Chavez, Lukas; Platten, Michael; Hänggi, Daniel; Unterberg, Andreas; Paulus, Werner; Wick, Wolfgang; Pfister, Stefan M; Mittelbronn, Michel; Preusser, Matthias; Herold-Mende, Christel; Weller, Michael; von Deimling, Andreas

    2017-05-01

    The WHO classification of brain tumours describes 15 subtypes of meningioma. Nine of these subtypes are allotted to WHO grade I, and three each to grade II and grade III. Grading is based solely on histology, with an absence of molecular markers. Although the existing classification and grading approach is of prognostic value, it harbours shortcomings such as ill-defined parameters for subtypes and grading criteria prone to arbitrary judgment. In this study, we aimed for a comprehensive characterisation of the entire molecular genetic landscape of meningioma to identify biologically and clinically relevant subgroups. In this multicentre, retrospective analysis, we investigated genome-wide DNA methylation patterns of meningiomas from ten European academic neuro-oncology centres to identify distinct methylation classes of meningiomas. The methylation classes were further characterised by DNA copy number analysis, mutational profiling, and RNA sequencing. Methylation classes were analysed for progression-free survival outcomes by the Kaplan-Meier method. The DNA methylation-based and WHO classification schema were compared using the Brier prediction score, analysed in an independent cohort with WHO grading, progression-free survival, and disease-specific survival data available, collected at the Medical University Vienna (Vienna, Austria), assessing methylation patterns with an alternative methylation chip. We retrospectively collected 497 meningiomas along with 309 samples of other extra-axial skull tumours that might histologically mimic meningioma variants. Unsupervised clustering of DNA methylation data clearly segregated all meningiomas from other skull tumours. We generated genome-wide DNA methylation profiles from all 497 meningioma samples. DNA methylation profiling distinguished six distinct clinically relevant methylation classes associated with typical mutational, cytogenetic, and gene expression patterns. Compared with WHO grading, classification by individual and combined methylation classes more accurately identifies patients at high risk of disease progression in tumours with WHO grade I histology, and patients at lower risk of recurrence among WHO grade II tumours (p=0·0096) from the Brier prediction test). We validated this finding in our independent cohort of 140 patients with meningioma. DNA methylation-based meningioma classification captures clinically more homogenous groups and has a higher power for predicting tumour recurrence and prognosis than the WHO classification. The approach presented here is potentially very useful for stratifying meningioma patients to observation-only or adjuvant treatment groups. We consider methylation-based tumour classification highly relevant for the future diagnosis and treatment of meningioma. German Cancer Aid, Else Kröner-Fresenius Foundation, and DKFZ/Heidelberg Institute of Personalized Oncology/Precision Oncology Program. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    PubMed

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  7. Molecular phylogeny and character evolution in terete-stemmed Andean opuntias (Cactaceae-Opuntioideae).

    PubMed

    Ritz, C M; Reiker, J; Charles, G; Hoxey, P; Hunt, D; Lowry, M; Stuppy, W; Taylor, N

    2012-11-01

    The cacti of tribe Tephrocacteae (Cactaceae-Opuntioideae) are adapted to diverse climatic conditions over a wide area of the southern Andes and adjacent lowlands. They exhibit a range of life forms from geophytes and cushion-plants to dwarf shrubs, shrubs or small trees. To confirm or challenge previous morphology-based classifications and molecular phylogenies, we sampled DNA sequences from the chloroplast trnK/matK region and the nuclear low copy gene phyC and compared the resulting phylogenies with previous data gathered from nuclear ribosomal DNA sequences. The here presented chloroplast and nuclear low copy gene phylogenies were mutually congruent and broadly coincident with the classification based on gross morphology and seed micro-morphology and anatomy. Reconstruction of hypothetical ancestral character states suggested that geophytes and cushion-forming species probably evolved several times from dwarf shrubby precursors. We also traced an increase of embryo size at the expense of the nucellus-derived storage tissue during the evolution of the Tephrocacteae, which is thought to be an evolutionary advantage because nutrients are then more rapidly accessible for the germinating embryo. In contrast to these highly concordant phylogenies, nuclear ribosomal DNA data sampled by a previous study yielded conflicting phylogenetic signals. Secondary structure predictions of ribosomal transcribed spacers suggested that this phylogeny is strongly influenced by the inclusion of paralogous sequence probably arisen by genome duplication during the evolution of this plant group. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Update on diabetes classification.

    PubMed

    Thomas, Celeste C; Philipson, Louis H

    2015-01-01

    This article highlights the difficulties in creating a definitive classification of diabetes mellitus in the absence of a complete understanding of the pathogenesis of the major forms. This brief review shows the evolving nature of the classification of diabetes mellitus. No classification scheme is ideal, and all have some overlap and inconsistencies. The only diabetes in which it is possible to accurately diagnose by DNA sequencing, monogenic diabetes, remains undiagnosed in more than 90% of the individuals who have diabetes caused by one of the known gene mutations. The point of classification, or taxonomy, of disease, should be to give insight into both pathogenesis and treatment. It remains a source of frustration that all schemes of diabetes mellitus continue to fall short of this goal. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Phylogeographic Analysis of Mitochondrial DNA in Northern Asian Populations

    PubMed Central

    Derenko, Miroslava ; Malyarchuk, Boris ; Grzybowski, Tomasz ; Denisova, Galina ; Dambueva, Irina ; Perkova, Maria ; Dorzhu, Choduraa ; Luzina, Faina ; Lee, Hong Kyu ; Vanecek, Tomas ; Villems, Richard ; Zakharov, Ilia 

    2007-01-01

    To elucidate the human colonization process of northern Asia and human dispersals to the Americas, a diverse subset of 71 mitochondrial DNA (mtDNA) lineages was chosen for complete genome sequencing from the collection of 1,432 control-region sequences sampled from 18 autochthonous populations of northern, central, eastern, and southwestern Asia. On the basis of complete mtDNA sequencing, we have revised the classification of haplogroups A, D2, G1, M7, and I; identified six new subhaplogroups (I4, N1e, G1c, M7d, M7e, and J1b2a); and fully characterized haplogroups N1a and G1b, which were previously described only by the first hypervariable segment (HVS1) sequencing and coding-region restriction-fragment–length polymorphism analysis. Our findings indicate that the southern Siberian mtDNA pool harbors several lineages associated with the Late Upper Paleolithic and/or early Neolithic dispersals from both eastern Asia and southwestern Asia/southern Caucasus. Moreover, the phylogeography of the D2 lineages suggests that southern Siberia is likely to be a geographical source for the last postglacial maximum spread of this subhaplogroup to northern Siberia and that the expansion of the D2b branch occurred in Beringia ∼7,000 years ago. In general, a detailed analysis of mtDNA gene pools of northern Asians provides the additional evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia. PMID:17924343

  10. Phylogeographic analysis of mitochondrial DNA in northern Asian populations.

    PubMed

    Derenko, Miroslava; Malyarchuk, Boris; Grzybowski, Tomasz; Denisova, Galina; Dambueva, Irina; Perkova, Maria; Dorzhu, Choduraa; Luzina, Faina; Lee, Hong Kyu; Vanecek, Tomas; Villems, Richard; Zakharov, Ilia

    2007-11-01

    To elucidate the human colonization process of northern Asia and human dispersals to the Americas, a diverse subset of 71 mitochondrial DNA (mtDNA) lineages was chosen for complete genome sequencing from the collection of 1,432 control-region sequences sampled from 18 autochthonous populations of northern, central, eastern, and southwestern Asia. On the basis of complete mtDNA sequencing, we have revised the classification of haplogroups A, D2, G1, M7, and I; identified six new subhaplogroups (I4, N1e, G1c, M7d, M7e, and J1b2a); and fully characterized haplogroups N1a and G1b, which were previously described only by the first hypervariable segment (HVS1) sequencing and coding-region restriction-fragment-length polymorphism analysis. Our findings indicate that the southern Siberian mtDNA pool harbors several lineages associated with the Late Upper Paleolithic and/or early Neolithic dispersals from both eastern Asia and southwestern Asia/southern Caucasus. Moreover, the phylogeography of the D2 lineages suggests that southern Siberia is likely to be a geographical source for the last postglacial maximum spread of this subhaplogroup to northern Siberia and that the expansion of the D2b branch occurred in Beringia ~7,000 years ago. In general, a detailed analysis of mtDNA gene pools of northern Asians provides the additional evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia.

  11. Application of advanced cytometric and molecular technologies to minimal residual disease monitoring

    NASA Astrophysics Data System (ADS)

    Leary, James F.; He, Feng; Reece, Lisa M.

    2000-04-01

    Minimal residual disease monitoring presents a number of theoretical and practical challenges. Recently it has been possible to meet some of these challenges by combining a number of new advanced biotechnologies. To monitor the number of residual tumor cells requires complex cocktails of molecular probes that collectively provide sensitivities of detection on the order of one residual tumor cell per million total cells. Ultra-high-speed, multi parameter flow cytometry is capable of analyzing cells at rates in excess of 100,000 cells/sec. Residual tumor selection marker cocktails can be optimized by use of receiver operating characteristic analysis. New data minimizing techniques when combined with multi variate statistical or neural network classifications of tumor cells can more accurately predict residual tumor cell frequencies. The combination of these techniques can, under at least some circumstances, detect frequencies of tumor cells as low as one cell in a million with an accuracy of over 98 percent correct classification. Detection of mutations in tumor suppressor genes requires insolation of these rare tumor cells and single-cell DNA sequencing. Rare residual tumor cells can be isolated at single cell level by high-resolution single-cell cell sorting. Molecular characterization of tumor suppressor gene mutations can be accomplished using a combination of single- cell polymerase chain reaction amplification of specific gene sequences followed by TA cloning techniques and DNA sequencing. Mutations as small as a single base pair in a tumor suppressor gene of a single sorted tumor cell have been detected using these methods. Using new amplification procedures and DNA micro arrays it should be possible to extend the capabilities shown in this paper to screening of multiple DNA mutations in tumor suppressor and other genes on small numbers of sorted metastatic tumor cells.

  12. Structural Analysis of Biodiversity

    PubMed Central

    Sirovich, Lawrence; Stoeckle, Mark Y.; Zhang, Yu

    2010-01-01

    Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity. PMID:20195371

  13. Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches.

    PubMed

    Yahr, Rebecca; Schoch, Conrad L; Dentinger, Bryn T M

    2016-09-05

    The fungal kingdom is a hyperdiverse group of multicellular eukaryotes with profound impacts on human society and ecosystem function. The challenge of documenting and describing fungal diversity is exacerbated by their typically cryptic nature, their ability to produce seemingly unrelated morphologies from a single individual and their similarity in appearance to distantly related taxa. This multiplicity of hurdles resulted in the early adoption of DNA-based comparisons to study fungal diversity, including linking curated DNA sequence data to expertly identified voucher specimens. DNA-barcoding approaches in fungi were first applied in specimen-based studies for identification and discovery of taxonomic diversity, but are now widely deployed for community characterization based on sequencing of environmental samples. Collectively, fungal barcoding approaches have yielded important advances across biological scales and research applications, from taxonomic, ecological, industrial and health perspectives. A major outstanding issue is the growing problem of 'sequences without names' that are somewhat uncoupled from the traditional framework of fungal classification based on morphology and preserved specimens. This review summarizes some of the most significant impacts of fungal barcoding, its limitations, and progress towards the challenge of effective utilization of the exponentially growing volume of data gathered from high-throughput sequencing technologies.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.

  14. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

  15. DNA motifs associated with aberrant CpG island methylation.

    PubMed

    Feltus, F Alex; Lee, Eva K; Costello, Joseph F; Plass, Christoph; Vertino, Paula M

    2006-05-01

    Epigenetic silencing involving the aberrant methylation of promoter region CpG islands is widely recognized as a tumor suppressor silencing mechanism in cancer. However, the molecular pathways underlying aberrant DNA methylation remain elusive. Recently we showed that, on a genome-wide level, CpG island loci differ in their intrinsic susceptibility to aberrant methylation and that this susceptibility can be predicted based on underlying sequence context. These data suggest that there are sequence/structural features that contribute to the protection from or susceptibility to aberrant methylation. Here we use motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to 28 methylation-prone or 47 methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were nonrandomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. Used together, the frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after 10-fold cross-validation. The motifs identified here are candidate methylation-targeting or methylation-protection DNA sequences.

  16. A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus.

    PubMed

    Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh

    2013-12-13

    Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.

  17. Classification of Ancient Mammal Individuals Using Dental Pulp MALDI-TOF MS Peptide Profiling

    PubMed Central

    Tran, Thi-Nguyen-Ny; Aboudharam, Gérard; Gardeisen, Armelle; Davoust, Bernard; Bocquet-Appel, Jean-Pierre; Flaudrops, Christophe; Belghazi, Maya; Raoult, Didier; Drancourt, Michel

    2011-01-01

    Background The classification of ancient animal corpses at the species level remains a challenging task for forensic scientists and anthropologists. Severe damage and mixed, tiny pieces originating from several skeletons may render morphological classification virtually impossible. Standard approaches are based on sequencing mitochondrial and nuclear targets. Methodology/Principal Findings We present a method that can accurately classify mammalian species using dental pulp and mass spectrometry peptide profiling. Our work was organized into three successive steps. First, after extracting proteins from the dental pulp collected from 37 modern individuals representing 13 mammalian species, trypsin-digested peptides were used for matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis. The resulting peptide profiles accurately classified every individual at the species level in agreement with parallel cytochrome b gene sequencing gold standard. Second, using a 279–modern spectrum database, we blindly classified 33 of 37 teeth collected in 37 modern individuals (89.1%). Third, we classified 10 of 18 teeth (56%) collected in 15 ancient individuals representing five mammal species including human, from five burial sites dating back 8,500 years. Further comparison with an upgraded database comprising ancient specimen profiles yielded 100% classification in ancient teeth. Peptide sequencing yield 4 and 16 different non-keratin proteins including collagen (alpha-1 type I and alpha-2 type I) in human ancient and modern dental pulp, respectively. Conclusions/Significance Mass spectrometry peptide profiling of the dental pulp is a new approach that can be added to the arsenal of species classification tools for forensics and anthropology as a complementary method to DNA sequencing. The dental pulp is a new source for collagen and other proteins for the species classification of modern and ancient mammal individuals. PMID:21364886

  18. Profiling Nematode Communities in Unmanaged Flowerbed and Agricultural Field Soils in Japan by DNA Barcode Sequencing

    PubMed Central

    Morise, Hisashi; Miyazaki, Erika; Yoshimitsu, Shoko; Eki, Toshihiko

    2012-01-01

    Soil nematodes play crucial roles in the soil food web and are a suitable indicator for assessing soil environments and ecosystems. Previous nematode community analyses based on nematode morphology classification have been shown to be useful for assessing various soil environments. Here we have conducted DNA barcode analysis for soil nematode community analyses in Japanese soils. We isolated nematodes from two different environmental soils of an unmanaged flowerbed and an agricultural field using the improved flotation-sieving method. Small subunit (SSU) rDNA fragments were directly amplified from each of 68 (flowerbed samples) and 48 (field samples) isolated nematodes to determine the nucleotide sequence. Sixteen and thirteen operational taxonomic units (OTUs) were obtained by multiple sequence alignment from the flowerbed and agricultural field nematodes, respectively. All 29 SSU rDNA-derived OTUs (rOTUs) were further mapped onto a phylogenetic tree with 107 known nematode species. Interestingly, the two nematode communities examined were clearly distinct from each other in terms of trophic groups: Animal predators and plant feeders were markedly abundant in the flowerbed soils, in contrast, bacterial feeders were dominantly observed in the agricultural field soils. The data from the flowerbed nematodes suggests a possible food web among two different trophic nematode groups and plants (weeds) in the closed soil environment. Finally, DNA sequences derived from the mitochondrial cytochrome oxidase c subunit 1 (COI) gene were determined as a DNA barcode from 43 agricultural field soil nematodes. These nematodes were assigned to 13 rDNA-derived OTUs, but in the COI gene analysis were assigned to 23 COI gene-derived OTUs (cOTUs), indicating that COI gene-based barcoding may provide higher taxonomic resolution than conventional SSU rDNA-barcoding in soil nematode community analysis. PMID:23284767

  19. Cryptic diversity in European bats.

    PubMed Central

    Mayer, F.; von Helversen, O.

    2001-01-01

    Different species of bat can be morphologically very similar. In order to estimate the amount of cryptic diversity among European bats we screened the intra- and interspecific genetic variation in 26 European vespertilionid bat species. We sequenced the DNA of subunit 1 of the mitochondrial protein NADH dehydrogenase (ND1) from several individuals of a species, which were sampled in a variety of geographical regions. A phylogeny based on the mitochondrial (mt) DNA data is in good agreement with the current classification in the family. Highly divergent mitochondrial lineages were found in two taxa, which differed in at least 11% of their ND1 sequence. The two mtDNA lineages in Plecotus austriacus correlated with the two subspecies Plecotus austriacus austriacus and Plecotus austriacus kolombatovici. The two mtDNA lineages in Myotis mystacinus were partitioned among two morphotypes. The evidence for two new bat species within Europe is discussed. Convergent adaptive evolution might have contributed to the morphological similarity among distantly related species if they occupy similar ecological niches. Closely related species may differ in their ecology but not necessarily in their morphology. On the other hand, two morphologically clearly different species (Eptesicus serotinus and Eptesicus nilssonii) were found to be genetically very similar. Neither morphological nor mitochondrial DNA sequence analysis alone can be guaranteed to identify species. PMID:11522202

  20. Graphical classification of DNA sequences of HLA alleles by deep learning.

    PubMed

    Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi

    2018-04-01

    Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

  1. Pathological Bases for a Robust Application of Cancer Molecular Classification

    PubMed Central

    Diaz-Cano, Salvador J.

    2015-01-01

    Any robust classification system depends on its purpose and must refer to accepted standards, its strength relying on predictive values and a careful consideration of known factors that can affect its reliability. In this context, a molecular classification of human cancer must refer to the current gold standard (histological classification) and try to improve it with key prognosticators for metastatic potential, staging and grading. Although organ-specific examples have been published based on proteomics, transcriptomics and genomics evaluations, the most popular approach uses gene expression analysis as a direct correlate of cellular differentiation, which represents the key feature of the histological classification. RNA is a labile molecule that varies significantly according with the preservation protocol, its transcription reflect the adaptation of the tumor cells to the microenvironment, it can be passed through mechanisms of intercellular transference of genetic information (exosomes), and it is exposed to epigenetic modifications. More robust classifications should be based on stable molecules, at the genetic level represented by DNA to improve reliability, and its analysis must deal with the concept of intratumoral heterogeneity, which is at the origin of tumor progression and is the byproduct of the selection process during the clonal expansion and progression of neoplasms. The simultaneous analysis of multiple DNA targets and next generation sequencing offer the best practical approach for an analytical genomic classification of tumors. PMID:25898411

  2. Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma.

    PubMed

    Ceccarelli, Michele; Barthel, Floris P; Malta, Tathiane M; Sabedot, Thais S; Salama, Sofie R; Murray, Bradley A; Morozova, Olena; Newton, Yulia; Radenbaugh, Amie; Pagnotta, Stefano M; Anjum, Samreen; Wang, Jiguang; Manyam, Ganiraju; Zoppoli, Pietro; Ling, Shiyun; Rao, Arjun A; Grifford, Mia; Cherniack, Andrew D; Zhang, Hailei; Poisson, Laila; Carlotti, Carlos Gilberto; Tirapelli, Daniela Pretti da Cunha; Rao, Arvind; Mikkelsen, Tom; Lau, Ching C; Yung, W K Alfred; Rabadan, Raul; Huse, Jason; Brat, Daniel J; Lehman, Norman L; Barnholtz-Sloan, Jill S; Zheng, Siyuan; Hess, Kenneth; Rao, Ganesh; Meyerson, Matthew; Beroukhim, Rameen; Cooper, Lee; Akbani, Rehan; Wrensch, Margaret; Haussler, David; Aldape, Kenneth D; Laird, Peter W; Gutmann, David H; Noushmehr, Houtan; Iavarone, Antonio; Verhaak, Roel G W

    2016-01-28

    Therapy development for adult diffuse glioma is hindered by incomplete knowledge of somatic glioma driving alterations and suboptimal disease classification. We defined the complete set of genes associated with 1,122 diffuse grade II-III-IV gliomas from The Cancer Genome Atlas and used molecular profiles to improve disease classification, identify molecular correlations, and provide insights into the progression from low- to high-grade disease. Whole-genome sequencing data analysis determined that ATRX but not TERT promoter mutations are associated with increased telomere length. Recent advances in glioma classification based on IDH mutation and 1p/19q co-deletion status were recapitulated through analysis of DNA methylation profiles, which identified clinically relevant molecular subsets. A subtype of IDH mutant glioma was associated with DNA demethylation and poor outcome; a group of IDH-wild-type diffuse glioma showed molecular similarity to pilocytic astrocytoma and relatively favorable survival. Understanding of cohesive disease groups may aid improved clinical outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Classification of Plant Associated Bacteria Using RIF, a Computationally Derived DNA Marker

    PubMed Central

    Schneider, Kevin L.; Marrero, Glorimar; Alvarez, Anne M.; Presting, Gernot G.

    2011-01-01

    A DNA marker that distinguishes plant associated bacteria at the species level and below was derived by comparing six sequenced genomes of Xanthomonas, a genus that contains many important phytopathogens. This DNA marker comprises a portion of the dnaA replication initiation factor (RIF). Unlike the rRNA genes, dnaA is a single copy gene in the vast majority of sequenced bacterial genomes, and amplification of RIF requires genus-specific primers. In silico analysis revealed that RIF has equal or greater ability to differentiate closely related species of Xanthomonas than the widely used ribosomal intergenic spacer region (ITS). Furthermore, in a set of 263 Xanthomonas, Ralstonia and Clavibacter strains, the RIF marker was directly sequenced in both directions with a success rate approximately 16% higher than that for ITS. RIF frameworks for Xanthomonas, Ralstonia and Clavibacter were constructed using 682 reference strains representing different species, subspecies, pathovars, races, hosts and geographic regions, and contain a total of 109 different RIF sequences. RIF sequences showed subspecific groupings but did not place strains of X. campestris or X. axonopodis into currently named pathovars nor R. solanacearum strains into their respective races, confirming previous conclusions that pathovar and race designations do not necessarily reflect genetic relationships. The RIF marker also was sequenced for 24 reference strains from three genera in the Enterobacteriaceae: Pectobacterium, Pantoea and Dickeya. RIF sequences of 70 previously uncharacterized strains of Ralstonia, Clavibacter, Pectobacterium and Dickeya matched, or were similar to, those of known reference strains, illustrating the utility of the frameworks to classify bacteria below the species level and rapidly match unknown isolates to reference strains. The RIF sequence frameworks are available at the online RIF database, RIFdb, and can be queried for diagnostic purposes with RIF sequences obtained from unknown strains in both chromatogram and FASTA format. PMID:21533033

  4. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

    PubMed

    Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

    2015-11-01

    Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. © 2015 John Wiley & Sons Ltd.

  5. FARME DB: a functional antibiotic resistance element database

    PubMed Central

    Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.

    2017-01-01

    Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567

  6. Evaluating the feasibility of using candidate DNA barcodes in discriminating species of the large Asteraceae family.

    PubMed

    Gao, Ting; Yao, Hui; Song, Jingyuan; Zhu, Yingjie; Liu, Chang; Chen, Shilin

    2010-10-26

    Five DNA regions, namely, rbcL, matK, ITS, ITS2, and psbA-trnH, have been recommended as primary DNA barcodes for plants. Studies evaluating these regions for species identification in the large plant taxon, which includes a large number of closely related species, have rarely been reported. The feasibility of using the five proposed DNA regions was tested for discriminating plant species within Asteraceae, the largest family of flowering plants. Among these markers, ITS2 was the most useful in terms of universality, sequence variation, and identification capability in the Asteraceae family. The species discriminating power of ITS2 was also explored in a large pool of 3,490 Asteraceae sequences that represent 2,315 species belonging to 494 different genera. The result shows that ITS2 correctly identified 76.4% and 97.4% of plant samples at the species and genus levels, respectively. In addition, ITS2 displayed a variable ability to discriminate related species within different genera. ITS2 is the best DNA barcode for the Asteraceae family. This approach significantly broadens the application of DNA barcoding to resolve classification problems in the family Asteraceae at the genera and species levels.

  7. Introduction of a novel 18S rDNA gene arrangement along with distinct ITS region in the saline water microalga Dunaliella

    PubMed Central

    2010-01-01

    Comparison of 18S rDNA gene sequences is a very promising method for identification and classification of living organisms. Molecular identification and discrimination of different Dunaliella species were carried out based on the size of 18S rDNA gene and, number and position of introns in the gene. Three types of 18S rDNA structure have already been reported: the gene with a size of ~1770 bp lacking any intron, with a size of ~2170 bp consisting one intron near 5' terminus, and with a size of ~2570 bp harbouring two introns near 5' and 3' termini. Hereby, we report a new 18S rDNA gene arrangement in terms of intron localization and nucleotide sequence in a Dunaliella isolated from Iranian salt lakes (ABRIINW-M1/2). PCR amplification with genus-specific primers resulted in production of a ~2170 bp DNA band, which is similar to that of D. salina 18S rDNA gene containing only one intron near 5' terminus. Whilst, sequence composition of the gene revealed the lack of any intron near 5' terminus in our isolate. Furthermore, another alteration was observed due to the presence of a 440 bp DNA fragment near 3' terminus. Accordingly, 18S rDNA gene of the isolate is clearly different from those of D. salina and any other Dunaliella species reported so far. Moreover, analysis of ITS region sequence showed the diversity of this region compared to the previously reported species. 18S rDNA and ITS sequences of our isolate were submitted with accesion numbers of EU678868 and EU927373 in NCBI database, respectively. The optimum growth rate of this isolate occured at the salinity level of 1 M NaCl. The maximum carotenoid content under stress condition of intense light (400 μmol photon m-2 s-1), high salinity (4 M NaCl) and deficiency of nitrate and phosphate nutritions reached to 240 ng/cell after 15 days. PMID:20377865

  8. High-resolution definition of the Vibrio cholerae essential gene set with hidden Markov model–based analyses of transposon-insertion sequencing data

    PubMed Central

    Chao, Michael C.; Pritchard, Justin R.; Zhang, Yanjia J.; Rubin, Eric J.; Livny, Jonathan; Davis, Brigid M.; Waldor, Matthew K.

    2013-01-01

    The coupling of high-density transposon mutagenesis to high-throughput DNA sequencing (transposon-insertion sequencing) enables simultaneous and genome-wide assessment of the contributions of individual loci to bacterial growth and survival. We have refined analysis of transposon-insertion sequencing data by normalizing for the effect of DNA replication on sequencing output and using a hidden Markov model (HMM)-based filter to exploit heretofore unappreciated information inherent in all transposon-insertion sequencing data sets. The HMM can smooth variations in read abundance and thereby reduce the effects of read noise, as well as permit fine scale mapping that is independent of genomic annotation and enable classification of loci into several functional categories (e.g. essential, domain essential or ‘sick’). We generated a high-resolution map of genomic loci (encompassing both intra- and intergenic sequences) that are required or beneficial for in vitro growth of the cholera pathogen, Vibrio cholerae. This work uncovered new metabolic and physiologic requirements for V. cholerae survival, and by combining transposon-insertion sequencing and transcriptomic data sets, we also identified several novel noncoding RNA species that contribute to V. cholerae growth. Our findings suggest that HMM-based approaches will enhance extraction of biological meaning from transposon-insertion sequencing genomic data. PMID:23901011

  9. [The use of 16S rDNA sequencing in species diversity analysis for sputum of patients with ventilator-associated pneumonia].

    PubMed

    Yang, Xiaojun; Wang, Xiaohong; Liang, Zhijuan; Zhang, Xiaoya; Wang, Yanbo; Wang, Zhenhai

    2014-05-01

    To study the species and amount of bacteria in sputum of patients with ventilator-associated pneumonia (VAP) by using 16S rDNA sequencing analysis, and to explore the new method for etiologic diagnosis of VAP. Bronchoalveolar lavage sputum samples were collected from 31 patients with VAP. Bacterial DNA of the samples were extracted and identified by polymerase chain reaction (PCR). At the same time, sputum specimens were processed for routine bacterial culture. The high flux sequencing experiment was conducted on PCR positive samples with 16S rDNA macro genome sequencing technology, and sequencing results were analyzed using bioinformatics, then the results between the sequencing and bacteria culture were compared. (1) 550 bp of specific DNA sequences were amplified in sputum specimens from 27 cases of the 31 patients with VAP, and they were used for sequencing analysis. 103 856 sequences were obtained from those sputum specimens using 16S rDNA sequencing, yielding approximately 39 Mb of raw data. Tag sequencing was able to inform genus level in all 27 samples. (2) Alpha-diversity analysis showed that sputum samples of patients with VAP had significantly higher variability and richness in bacterial species (Shannon index values 1.20, Simpson index values 0.48). Rarefaction curve analysis showed that there were more species that were not detected by sequencing from some VAP sputum samples. (3) Analysis of 27 sputum samples with VAP by using 16S rDNA sequences yielded four phyla: namely Acitinobacteria, Bacteroidetes, Firmicutes, Proteobacteria. With genus as a classification, it was found that the dominant species included Streptococcus 88.9% (24/27), Limnohabitans 77.8% (21/27), Acinetobacter 70.4% (19/27), Sphingomonas 63.0% (17/27), Prevotella 63.0% (17/27), Klebsiella 55.6% (15/27), Pseudomonas 55.6% (15/27), Aquabacterium 55.6% (15/27), and Corynebacterium 55.6% (15/27). (4) Pyrophosphate sequencing discovered that Prevotella, Limnohabitans, Aquabacterium, Sphingomonas might not be detected by routine bacteria culture. Among seven species which were identified by both methods, pyrophosphate sequencing yielded higher positive rate than that of ordinary bacteria culture [Streptococcus: 88.9% (24/27) vs. 18.5% (5/27), Klebsiella: 55.6% (15/27) vs. 18.5% (5/27), Acinetobacter: 70.4% (19/27) vs. 37.0% (10/27), Corynebacterium: 55.6% (15/27) vs. 7.4% (2/27), P<0.05 or P<0.01]. Sequencing positive rate was found to increase positive rate for culture of Pseudomonas [55.6% (15/27) vs. 25.9% (7/27), P=0.050]. No significant differences were observed between sequencing and ordinary bacteria culture for detection Staphylococcus [7.4% (2/27) vs. 11.1% (3/27)] and Neisseria bacteria genera [18.5% (5/27) vs. 3.7% (1/27), both P>0.05]. 16S rDNA sequencing analysis confirmed that pathogenic bacteria in sputum of VAP were complicated with multiple drug resistant strains. Compared with routine bacterial culture, pyrophosphate sequencing had higher positive rate in detecting pathogens. 16S rDNA gene sequencing technology may become a new method for etiological diagnosis of VAP.

  10. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  11. TRX-LOGOS - a graphical tool to demonstrate DNA information content dependent upon backbone dynamics in addition to base sequence.

    PubMed

    Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A

    2015-01-01

    It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.

  12. Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.

    PubMed

    Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra

    2012-02-01

    Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries.

  13. A nuclear ribosomal DNA pseudogene in triatomines opens a new research field of fundamental and applied implications in Chagas disease.

    PubMed

    Zuriaga, María Angeles; Mas-Coma, Santiago; Bargues, María Dolores

    2015-05-01

    A pseudogene, designated as "ps(5.8S+ITS-2)", paralogous to the 5.8S gene and internal transcribed spacer (ITS)-2 of the nuclear ribosomal DNA (rDNA), has been recently found in many triatomine species distributed throughout North America, Central America and northern South America. Among characteristics used as criteria for pseudogene verification, secondary structures and free energy are highlighted, showing a lower fit between minimum free energy, partition function and centroid structures, although in given cases the fit only appeared to be slightly lower. The unique characteristics of "ps(5.8S+ITS-2)" as a processed or retrotransposed pseudogenic unit of the ghost type are reviewed, with emphasis on its potential functionality compared to the functionality of genes and spacers of the normal rDNA operon. Besides the technical problem of the risk for erroneous sequence results, the usefulness of "ps(5.8S+ITS-2)" for specimen classification, phylogenetic analyses and systematic/taxonomic studies should be highlighted, based on consistence and retention index values, which in pseudogenic sequence trees were higher than in functional sequence trees. Additionally, intraindividual, interpopulational and interspecific differences in pseudogene amount and the fact that it is a pseudogene in the nuclear rDNA suggests a potential relationships with fitness, behaviour and adaptability of triatomine vectors and consequently its potential utility in Chagas disease epidemiology and control.

  14. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  15. Phylogenetic Status of an Unrecorded Species of Curvularia, C. spicifera, Based on Current Classification System of Curvularia and Bipolaris Group Using Multi Loci.

    PubMed

    Jeon, Sun Jeong; Nguyen, Thi Thuong Thuong; Lee, Hyang Burm

    2015-09-01

    A seed-borne fungus, Curvularia sp. EML-KWD01, was isolated from an indigenous wheat seed by standard blotter method. This fungus was characterized based on the morphological characteristics and molecular phylogenetic analysis. Phylogenetic status of the fungus was determined using sequences of three loci: rDNA internal transcribed spacer, large ribosomal subunit, and glyceraldehyde 3-phosphate dehydrogenase gene. Multi loci sequencing analysis revealed that this fungus was Curvularia spicifera within Curvularia group 2 of family Pleosporaceae.

  16. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses

    DOE PAGES

    Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna; ...

    2016-10-30

    Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less

  17. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna

    Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less

  18. Molecular systematics of higher primates: genealogical relations and classification.

    PubMed Central

    Miyamoto, M M; Koop, B F; Slightom, J L; Goodman, M; Tennant, M R

    1988-01-01

    We obtained 5' and 3' flanking sequences (5.4 kilobase pairs) from the psi eta-globin gene region of the rhesus macaque (Macaca mulatta) and combined them with available nucleotide data. The completed sequence, representing 10.8 kilobase pairs of contiguous noncoding DNA, was compared to the same orthologous regions available for human (Homo sapiens, as represented by five different alleles), common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus). The nucleotide sequence for Macaca mulatta provided the outgroup perspective needed to evaluate better the relationships of humans and great apes. Pairwise comparisons and parsimony analysis of these orthologues clearly demonstrated (i) that humans and great apes share a high degree of genetic similarity and (ii) that humans, chimpanzees, and gorillas form a natural monophyletic group. These conclusions strongly favor a genealogical classification for higher primates consisting of a single family (Hominidae) with two subfamilies (Homininae for Homo, Pan, and Gorilla and Ponginae for Pongo). PMID:3174657

  19. Characterization of limes (Citrus aurantifolia) grown in Bhutan and Indonesia using high-throughput sequencing

    PubMed Central

    Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio

    2014-01-01

    Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes. PMID:24781859

  20. FBIS: A regional DNA barcode archival & analysis system for Indian fishes.

    PubMed

    Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar

    2012-01-01

    DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. The database is available for free at http://mail.nbfgr.res.in/fbis/

  1. Molecular systematics of Gagea and Lloydia (Liliaceae; Liliales): implications of analyses of nuclear ribosomal and plastid DNA sequences for infrageneric classification

    PubMed Central

    Zarrei, M.; Wilkin, P.; Fay, M. F.; Ingrouille, M. J.; Zarre, S.; Chase, M. W.

    2009-01-01

    Background and Aims Gagea is a Eurasian genus of petaloid monocots, with a few species in North Africa, comprising between 70 and approximately 275 species depending on the author. Lloydia (thought to be the closest relative of Gagea) consists of 12–20 species that have a mostly eastern Asian distribution. Delimitation of these genera and their subdivisions are unresolved questions in Liliaceae taxonomy. The objective of this study is to evaluate generic and infrageneric circumscription of Gagea and Lloydia using DNA sequence data. Methods A phylogenetic study of Gagea and Lloydia (Liliaceae) was conducted using sequences of nuclear ribosomal internal transcribed spacer (ITS) and plastid (rpl16 intron, trnL intron, trnL-F spacer, matK and the psbA-trnH spacer) DNA regions. This included 149 accessions (seven as outgroups), with multiple accessions of some taxa; 552 sequences were included, of which 393 were generated as part of this research. Key Results A close relationship of Gagea and Lloydia was confirmed in analyses using different datasets, but neither Gagea nor Lloydia forms a monophyletic group as currently circumscribed; however, the ITS and plastid analyses did not produce congruent results for the placement of Lloydia relative to the major groups within Gagea. Gagea accessions formed five moderately to strongly supported clades in all trees, with most Lloydia taxa positioned at the basal nodes; in the strict consensus trees from the combined data a basal polytomy occurs. There is limited congruence between the classical, morphology-derived infrageneric taxonomy in Gagea (including Lloydia) and clades in the present phylogenetic analyses. Conclusions The analyses support monophyly of Gagea/Lloydia collectively, and they clearly comprise a single lineage, as some previous authors have hypothesized. The results provide the basis for a new classification of Gagea that has support from some morphological features. Incongruence between plastid and nuclear ITS results is interpreted as potentially due to ancient hybridization and/or paralogy of ITS rDNA. PMID:19451146

  2. Invertebrate Iridoviruses: A Glance over the Last Decade

    PubMed Central

    Özcan, Orhan; Ilter-Akulke, Ayca Zeynep; Scully, Erin D.; Özgen, Arzu

    2018-01-01

    Members of the family Iridoviridae (iridovirids) are large dsDNA viruses that infect both invertebrate and vertebrate ectotherms and whose symptoms range in severity from minor reductions in host fitness to systemic disease and large-scale mortality. Several characteristics have been useful for classifying iridoviruses; however, novel strains are continuously being discovered and, in many cases, reliable classification has been challenging. Further impeding classification, invertebrate iridoviruses (IIVs) can occasionally infect vertebrates; thus, host range is often not a useful criterion for classification. In this review, we discuss the current classification of iridovirids, focusing on genomic and structural features that distinguish vertebrate and invertebrate iridovirids and viral factors linked to host interactions in IIV6 (Invertebrate iridescent virus 6). In addition, we show for the first time how complete genome sequences of viral isolates can be leveraged to improve classification of new iridovirid isolates and resolve ambiguous relations. Improved classification of the iridoviruses may facilitate the identification of genus-specific virulence factors linked with diverse host phenotypes and host interactions. PMID:29601483

  3. Invertebrate Iridoviruses: A Glance over the Last Decade.

    PubMed

    İnce, İkbal Agah; Özcan, Orhan; Ilter-Akulke, Ayca Zeynep; Scully, Erin D; Özgen, Arzu

    2018-03-30

    Members of the family Iridoviridae (iridovirids) are large dsDNA viruses that infect both invertebrate and vertebrate ectotherms and whose symptoms range in severity from minor reductions in host fitness to systemic disease and large-scale mortality. Several characteristics have been useful for classifying iridoviruses; however, novel strains are continuously being discovered and, in many cases, reliable classification has been challenging. Further impeding classification, invertebrate iridoviruses (IIVs) can occasionally infect vertebrates; thus, host range is often not a useful criterion for classification. In this review, we discuss the current classification of iridovirids, focusing on genomic and structural features that distinguish vertebrate and invertebrate iridovirids and viral factors linked to host interactions in IIV6 (Invertebrate iridescent virus 6). In addition, we show for the first time how complete genome sequences of viral isolates can be leveraged to improve classification of new iridovirid isolates and resolve ambiguous relations. Improved classification of the iridoviruses may facilitate the identification of genus-specific virulence factors linked with diverse host phenotypes and host interactions.

  4. Molecular Phylogeny, Recent Radiation and Evolution of Gross Morphology of the Rhubarb genus Rheum (Polygonaceae) Inferred from Chloroplast DNA trnL-F Sequences

    PubMed Central

    WANG, AILAN; YANG, MEIHUA; LIU, JIANQUAN

    2005-01-01

    • Background and Aims Rheum, a highly diversified genus with about 60 species, is mainly confined to the mountainous and desert regions of the Qinghai–Tibetan plateau and adjacent areas. This genus represents a good example of the extensive diversification of the temperate genera in the Qinghai–Tibetan plateau, in which the forces to drive diversification remain unknown. To date, the infrageneric classification of Rheum has been mainly based on morphological characters. However, it may have been subject to convergent evolution under habitat pressure, and the systematic position of some sections are unclear, especially Sect. Globulosa, which has globular inflorescences, and Sect. Nobilia, which has semi-translucent bracts. Recent palynological research has found substantial contradictions between exine patterns and the current classification of Rheum. Two specific objectives of this research were (1) to evaluate possible relationships of some ambiguous sections with a unique morphology, and (2) to examine possible occurrence of the radiative speciation with low genetic divergence across the total genus and the correlation between the extensive diversification time of Rheum and past geographical events, especially the recent large-scale uplifts of the Qinghai–Tibetan Plateau. • Methods The chloroplast DNA trnL-F region of 29 individuals representing 26 species of Rheum, belonging to seven out of eight sections, was sequenced and compared. The phylogenetic relationships were further constructed based on the sequences obtained. • Key Results Despite the highly diversified morphology, the genetic variation in this DNA fragment is relatively low. The molecular phylogeny is highly inconsistent with gross morphology, pollen exine patterns and traditional classifications, except for identifying all samples of Sect. Palmata, three species of Sect. Spiciformia and a few species of Sect. Rheum as corresponding monophyletic groups. The monotypic Sect. Globulosa showed a tentative position within the clade comprising five species of Sect. Rheum. All of the analyses revealed the paraphyly of R. nobile and R. alexandrae, the only two species of Sect. Nobilia circumscribed by the possession of large bracts. The crude calibration of lineages based on trnL-F sequence differentiation implied an extensive diversification of Rheum within approx. 7 million years. • Conclusions Based on these results, it is suggested that the rich geological and ecological diversity caused by the recent large-scale uplifts of the Qinghai–Tibetan Plateau since the late Tertiary, coupled with the oscillating climate of the Quaternary stage, might have promoted rapid speciation in small and isolated populations, as well as allowing the fixation of unique or rare morphological characters in Rheum. Such a rapid radiation, combined with introgressive hybridization and reticulate evolution, may have caused the transfer of cpDNA haplotypes between morphologically dissimilar species, and might account for the inconsistency between morphological classification and molecular phylogeny reported here. PMID:15994840

  5. Innovative Methods for Integrating Knowledge for Long-Term Monitoring of Contaminated Groundwater Sites: Understanding Microorganism Communities and their Associated Hydrochemical Environment

    NASA Astrophysics Data System (ADS)

    Mouser, P. J.; Rizzo, D. M.; Druschel, G.; O'Grady, P.; Stevens, L.

    2005-12-01

    This interdisciplinary study integrates hydrochemical and genome-based data to estimate the redox processes occurring at long-term monitoring sites. Groundwater samples have been collected from a well-characterized landfill-leachate contaminated aquifer in northeastern New York. Primers from the 16S rDNA gene were used to amplify Bacteria and Archaea in groundwater taken from monitoring wells located in clean, fringe, and contaminated locations within the aquifer. PCR-amplified rDNA were digested with restriction enzymes to evaluate terminal restriction fragment length polymorphism (T-RFLP) community profiles. The rDNA was cloned, sequenced, and partial sequences were matched against known organisms using the NCBI Blast database. Phylogenetic trees and bootstrapping were used to identify classifications of organisms and compare the communities from clean, fringe, and contaminated locations. We used Artificial Neural Network (ANN) models to incorporate microbial data with hydrochemical information for improving our understanding of subsurface processes.

  6. Defining operational taxonomic units using DNA barcode data.

    PubMed

    Blaxter, Mark; Mann, Jenna; Chapman, Tom; Thomas, Fran; Whitton, Claire; Floyd, Robin; Abebe, Eyualem

    2005-10-29

    The scale of diversity of life on this planet is a significant challenge for any scientific programme hoping to produce a complete catalogue, whatever means is used. For DNA barcoding studies, this difficulty is compounded by the realization that any chosen barcode sequence is not the gene 'for' speciation and that taxa have evolutionary histories. How are we to disentangle the confounding effects of reticulate population genetic processes? Using the DNA barcode data from meiofaunal surveys, here we discuss the benefits of treating the taxa defined by barcodes without reference to their correspondence to 'species', and suggest that using this non-idealist approach facilitates access to taxon groups that are not accessible to other methods of enumeration and classification. Major issues remain, in particular the methodologies for taxon discrimination in DNA barcode data.

  7. Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps.

    PubMed

    Aires-de-Sousa, João; Aires-de-Sousa, Luisa

    2003-01-01

    We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002

  8. Learning a weighted sequence model of the nucleosome core and linker yields more accurate predictions in Saccharomyces cerevisiae and Homo sapiens.

    PubMed

    Reynolds, Sheila M; Bilmes, Jeff A; Noble, William Stafford

    2010-07-08

    DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence-301 base pairs, centered at the position to be scored-with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the bulk of the remaining nucleosomes follow a statistical positioning model.

  9. Evaluating the feasibility of using candidate DNA barcodes in discriminating species of the large Asteraceae family

    PubMed Central

    2010-01-01

    Background Five DNA regions, namely, rbcL, matK, ITS, ITS2, and psbA-trnH, have been recommended as primary DNA barcodes for plants. Studies evaluating these regions for species identification in the large plant taxon, which includes a large number of closely related species, have rarely been reported. Results The feasibility of using the five proposed DNA regions was tested for discriminating plant species within Asteraceae, the largest family of flowering plants. Among these markers, ITS2 was the most useful in terms of universality, sequence variation, and identification capability in the Asteraceae family. The species discriminating power of ITS2 was also explored in a large pool of 3,490 Asteraceae sequences that represent 2,315 species belonging to 494 different genera. The result shows that ITS2 correctly identified 76.4% and 97.4% of plant samples at the species and genus levels, respectively. In addition, ITS2 displayed a variable ability to discriminate related species within different genera. Conclusions ITS2 is the best DNA barcode for the Asteraceae family. This approach significantly broadens the application of DNA barcoding to resolve classification problems in the family Asteraceae at the genera and species levels. PMID:20977734

  10. Methylopila helvetica sp. nov. and Methylobacterium dichloromethanicum sp. nov.--novel aerobic facultatively methylotrophic bacteria utilizing dichloromethane.

    PubMed

    Doronina, N V; Trotsenko, Y A; Tourova, T P; Kuznetsov, B B; Leisinger, T

    2000-06-01

    Eight strains of Gram-negative, aerobic, asporogenous, neutrophilic, mesophilic, facultatively methylotrophic bacteria are taxonomically described. These icl- serine pathway methylobacteria utilize dichloromethane, methanol and methylamine as well as a variety of polycarbon compounds as the carbon and energy source. The major cellular fatty acids of the non-pigmented strains DM1, DM3, and DM5 to DM9 are C18:1, C16:0, C18:0, Ccy19:0 and that of the pink-pigmented strain DM4 is C18:1. The main quinone of all the strains is Q-10. The non-pigmented strains have similar phenotypic properties and a high level of DNA-DNA relatedness (81-98%) as determined by hybridization. All strains belong to the alpha-subgroup of the alpha-Proteobacteria. 16S rDNA sequence analysis led to the classification of these dichloromethane-utilizers in the genus Methylopila as a new species - Methylopila helvetica sp.nov. with the type strain DM9 (=VKM B-2189). The pink-pigmented strain DM4 belongs to the genus Methylobacterium but differs from the known members of this genus by some phenotypic properties, DNA-DNA relatedness (14-57%) and 16S rDNA sequence. Strain DM4 is named Methylobacterium dichloromethanicum sp. nov. (VKM B-2191 = DSMZ 6343).

  11. Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean.

    PubMed

    Liu, Chen-Jian; Wang, Rui; Gong, Fu-Ming; Liu, Xiao-Feng; Zheng, Hua-Jun; Luo, Yi-Yong; Li, Xiao-Ran

    2015-12-01

    Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance.

    PubMed

    Casimiro, Ana C; Vinga, Susana; Freitas, Ana T; Oliveira, Arlindo L

    2008-02-07

    Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially. We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery. We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.

  13. Defining operational taxonomic units using DNA barcode data

    PubMed Central

    Blaxter, Mark; Mann, Jenna; Chapman, Tom; Thomas, Fran; Whitton, Claire; Floyd, Robin; Abebe, Eyualem

    2005-01-01

    Abstract The scale of diversity of life on this planet is a significant challenge for any scientific programme hoping to produce a complete catalogue, whatever means is used. For DNA barcoding studies, this difficulty is compounded by the realization that any chosen barcode sequence is not the gene ‘for’ speciation and that taxa have evolutionary histories. How are we to disentangle the confounding effects of reticulate population genetic processes? Using the DNA barcode data from meiofaunal surveys, here we discuss the benefits of treating the taxa defined by barcodes without reference to their correspondence to ‘species’, and suggest that using this non-idealist approach facilitates access to taxon groups that are not accessible to other methods of enumeration and classification. Major issues remain, in particular the methodologies for taxon discrimination in DNA barcode data. PMID:16214751

  14. Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

    PubMed

    Hoshino, Tatsuhiko; Inagaki, Fumio

    2017-01-01

    Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.

  15. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.

    PubMed

    Paez-Espino, David; Chen, I-Min A; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M; Nielsen, Torben; Huntemann, Marcel; K Reddy, T B; Pavlopoulos, Georgios A; Sullivan, Matthew B; Campbell, Barbara J; Chen, Feng; McMahon, Katherine; Hallam, Steve J; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M; Streit, Wolfgang R; Webster, John; Handley, Kim M; Salekdeh, Ghasem H; Tsesmetzis, Nicolas; Setubal, Joao C; Pope, Phillip B; Liu, Wen-Tso; Rivers, Adam R; Ivanova, Natalia N; Kyrpides, Nikos C

    2017-01-04

    Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Evolutionary conservation of sequence and secondary structures inCRISPR repeats

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

    Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeatsmore » identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.« less

  17. Species-specific markers for the differential diagnosis of Trypanosoma cruzi and Trypanosoma rangeli and polymorphisms detection in Trypanosoma rangeli.

    PubMed

    Ferreira, Keila Adriana Magalhães; Fajardo, Emanuella Francisco; Baptista, Rodrigo P; Macedo, Andrea Mara; Lages-Silva, Eliane; Ramírez, Luis Eduardo; Pedrosa, André Luiz

    2014-06-01

    Trypanosoma cruzi and Trypanosoma rangeli are kinetoplastid parasites which are able to infect humans in Central and South America. Misdiagnosis between these trypanosomes can be avoided by targeting barcoding sequences or genes of each organism. This work aims to analyze the feasibility of using species-specific markers for identification of intraspecific polymorphisms and as target for diagnostic methods by PCR. Accordingly, primers which are able to specifically detect T. cruzi or T. rangeli genomic DNA were characterized. The use of intergenic regions, generally divergent in the trypanosomatids, and the serine carboxypeptidase gene were successful. Using T. rangeli genomic sequences for the identification of group-specific polymorphisms and a polymorphic AT(n) dinucleotide repeat permitted the classification of the strains into two groups, which are entirely coincident with T. rangeli main lineages, KP1 (+) and KP1 (-), previously determined by kinetoplast DNA (kDNA) characterization. The sequences analyzed totalize 622 bp (382 bp represent a hypothetical protein sequence, and 240 bp represent an anonymous sequence), and of these, 581 (93.3%) are conserved sites and 41 bp (6.7%) are polymorphic, with 9 transitions (21.9%), 2 transversions (4.9%), and 30 (73.2%) insertion/deletion events. Taken together, the species-specific markers analyzed may be useful for the development of new strategies for the accurate diagnosis of infections. Furthermore, the identification of T. rangeli polymorphisms has a direct impact in the understanding of the population structure of this parasite.

  18. Three-gene based phylogeny of the Urostyloidea (Protista, Ciliophora, Hypotricha), with notes on classification of some core taxa☆

    PubMed Central

    Huang, Jie; Chen, Zigui; Song, Weibo; Berger, Helmut

    2014-01-01

    Classifications of the Urostyloidea were mainly based on morphology and morphogenesis. Since molecular phylogeny largely focused on limited sampling using mostly the one-gene information, the incongruence between morphological data and gene sequences have risen. In this work, the three-gene data (SSU-rDNA, ITS1-5.8S-ITS2 and LSU-rDNA) comprising 12 genera in the “core urostyloids” are sequenced, and the phylogenies based on these different markers are compared using maximum-likelihood and Bayesian algorithms and tested by unconstrained and constrained analyses. The molecular phylogeny supports the following conclusions: (1) the monophyly of the core group of Urostyloidea is well supported while the whole Urostyloidea is not monophyletic; (2) Thigmokeronopsis and Apokeronopsis are clearly separated from the pseudokeronopsids in analyses of all three gene markers, supporting their exclusion from the Pseudokeronopsidae and the inclusion in the Urostylidae; (3) Diaxonella and Apobakuella should be assigned to the Urostylidae; (4) Bergeriella, Monocoronella and Neourostylopsis flavicana share a most recent common ancestor; (5) all molecular trees support the transfer of Metaurostylopsis flavicana to the recently proposed genus Neourostylopsis; (6) all molecular phylogenies fail to separate the morphologically well-defined genera Uroleptopsis and Pseudokeronopsis; and (7) Arcuseries gen. nov. containing three distinctly deviating Anteholosticha species is established. PMID:24140978

  19. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results

    PubMed Central

    Plon, Sharon E.; Eccles, Diana M.; Easton, Douglas; Foulkes, William D.; Genuardi, Maurizio; Greenblatt, Marc S.; Hogervorst, Frans B.L.; Hoogerbrugge, Nicoline; Spurdle, Amanda B.; Tavtigian, Sean

    2011-01-01

    Genetic testing of cancer susceptibility genes is now widely applied in clinical practice to predict risk of developing cancer. In general, sequence-based testing of germline DNA is used to determine whether an individual carries a change that is clearly likely to disrupt normal gene function. Genetic testing may detect changes that are clearly pathogenic, clearly neutral or variants of unclear clinical significance. Such variants present a considerable challenge to the diagnostic laboratory and the receiving clinician in terms of interpretation and clear presentation of the implications of the result to the patient. There does not appear to be a consistent approach to interpreting and reporting the clinical significance of variants either among genes or among laboratories. The potential for confusion among clinicians and patients is considerable and misinterpretation may lead to inappropriate clinical consequences. In this article we review the current state of sequence-based genetic testing, describe other standardized reporting systems used in oncology and propose a standardized classification system for application to sequence based results for cancer predisposition genes. We suggest a system of five classes of variants based on the degree of likelihood of pathogenicity. Each class is associated with specific recommendations for clinical management of at-risk relatives that will depend on the syndrome. We propose that panels of experts on each cancer predisposition syndrome facilitate the classification scheme and designate appropriate surveillance and cancer management guidelines. The international adoption of a standardized reporting system should improve the clinical utility of sequence-based genetic tests to predict cancer risk. PMID:18951446

  20. Population Genetic Structure and Phylogeography of Camellia flavida (Theaceae) Based on Chloroplast and Nuclear DNA Sequences

    PubMed Central

    Wei, Su-Juan; Lu, Yong-Bin; Ye, Quan-Qing; Tang, Shao-Qing

    2017-01-01

    Camellia flavida is an endangered species of yellow camellia growing in limestone mountains in southwest China. The current classification of C. flavida into two varieties, var. flavida and var. patens, is controversial. We conducted a genetic analysis of C. flavida to determine its taxonomic structure. A total of 188 individual plants from 20 populations across the entire distribution range in southwest China were analyzed using two DNA fragments: a chloroplast DNA fragment from the small single copy region and a single-copy nuclear gene called phenylalanine ammonia-lyase (PAL). Sequences from both chloroplast and nuclear DNA were highly diverse; with high levels of genetic differentiation and restricted gene flow. This result can be attributed to the high habitat heterogeneity in limestone karst, which isolates C. flavida populations from each other. Our nuclear DNA results demonstrate that there are three differentiated groups within C. flavida: var. flavida 1, var. flavida 2, and var. patens. These genetic groupings are consistent with the morphological characteristics of the plants. We suggest that the samples included in this study constitute three taxa and the var. flavida 2 group is the genuine C. flavida. The three groups should be recognized as three management units for conservation concerns. PMID:28579991

  1. FBIS: A regional DNA barcode archival & analysis system for Indian fishes

    PubMed Central

    Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar

    2012-01-01

    DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. Availability The database is available for free at http://mail.nbfgr.res.in/fbis/ PMID:22715304

  2. Phenotype classification of single cells using SRS microscopy, RNA sequencing, and microfluidics (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi

    2016-03-01

    Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.

  3. Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response

    PubMed Central

    Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu

    2007-01-01

    Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061

  4. rbcL gene sequences provide evidence for the evolutionary lineages of leptosporangiate ferns.

    PubMed

    Hasebe, M; Omori, T; Nakazawa, M; Sano, T; Kato, M; Iwatsuki, K

    1994-06-07

    Pteriodophytes have a longer evolutionary history than any other vascular land plant and, therefore, have endured greater loss of phylogenetically informative information. This factor has resulted in substantial disagreements in evaluating characters and, thus, controversy in establishing a stable classification. To compare competing classifications, we obtained DNA sequences of a chloroplast gene. The sequence of 1206 nt of the large subunit of the ribulose-bisphosphate carboxylase gene (rbcL) was determined from 58 species, representing almost all families of leptosporangiate ferns. Phlogenetic trees were inferred by the neighbor-joining and the parsimony methods. The two methods produced almost identical phylogenetic trees that provided insights concerning major general evolutionary trends in the leptosporangiate ferns. Interesting findings were as follows: (i) two morphologically distinct heterosporous water ferns, Marsilea and Salvinia, are sister genera; (ii) the tree ferns (Cyatheaceae, Dicksoniaceae, and Metaxyaceae) are monophyletic; and (iii) polypodioids are distantly related to the gleichenioids in spite of the similarity of their exindusiate soral morphology and are close to the higher indusiate ferns. In addition, the affinities of several "problematic genera" were assessed.

  5. Genetic diversity and classification of Tibetan yak populations based on the mtDNA COIII gene.

    PubMed

    Song, Q Q; Chai, Z X; Xin, J W; Zhao, S J; Ji, Q M; Zhang, C F; Ma, Z J; Zhong, J C

    2015-03-13

    To determine the level of genetic diversity and phylogenetic relationships among Tibetan yak populations, the mitochondrial DNA cytochrome c oxidase subunit 3 (COIII) genes of 378 yak individuals from 16 populations were analyzed in this study. The results showed that the length of cytochrome c oxidase subunit 3 gene sequences was 781 bp, with nucleotide frequencies of 29.2, 29.4, 26.1, and 15.2% for T, C, A, and G, respectively. A total of 26 haplotypes were identified, with 69 polymorphic sites, including 11 parsimony-informative sites and 58 single-nucleotide polymorphism sites. No deletions/insertions were found in sequence comparison, indicating that nucleotide mutation types were transitions and transversions. Haplotype and nucleotide diversities were 0.562 and 0.00138, respectively, indicating a high level of genetic diversity in Tibetan yak populations. Phylogenetic relationship analysis indicated that Tibetan yak populations are divided into 2 groups.

  6. Re-classification of Clavibacter michiganensis subspecies on the basis of whole-genome and multi-locus sequence analyses.

    PubMed

    Li, Xiang; Tambong, James; Yuan, Kat Xiaoli; Chen, Wen; Xu, Huimin; Lévesque, C André; De Boer, Solke H

    2018-01-01

    Although the genus Clavibacter was originally proposed to accommodate all phytopathogenic coryneform bacteria containing B2γ diaminobutyrate in the peptidoglycan, reclassification of all but one species into other genera has resulted in the current monospecific status of the genus. The single species in the genus, Clavibacter michiganensis, has multiple subspecies, which are all highly host-specific plant pathogens. Whole genome analysis based on average nucleotide identity and digital DNA-DNA hybridization as well as multi-locus sequence analysis (MLSA) of seven housekeeping genes support raising each of the C. michiganensis subspecies to species status. On the basis of whole genome and MLSA data, we propose the establishment of two new species and three new combinations: Clavibacter capsici sp. nov., comb. nov. and Clavibacter tessellarius sp. nov., comb. nov., and Clavibacter insidiosus comb. nov., Clavibacter nebraskensis comb. nov. and Clavibacter sepedonicus comb. nov.

  7. Cloning, annotation and expression analysis of mycoparasitism-related genes in Trichoderma harzianum 88.

    PubMed

    Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun

    2013-04-01

    Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions.

  8. [Clinical classification and genetic mutation study of two pedigrees with type II Waardenburg syndrome].

    PubMed

    Chen, Yong; Yang, Fuwei; Zheng, Hexin; Zhu, Ganghua; Hu, Peng; Wu, Weijing

    2015-12-01

    To explore the molecular etiology of two pedigrees affected with type II Waardenburg syndrome (WS2) and to provide genetic diagnosis and counseling. Blood samples were collected from the proband and his family members. Following extraction of genomic DNA, the coding sequences of PAX3, MITF, SOX10 and SNAI2 genes were amplified with PCR and subjected to DNA sequencing to detect potential mutations. A heterozygous deletional mutation c.649_651delAGA in exon 7 of the MITF gene has been identified in all patients from the first family, while no mutation was found in the other WS2 related genes including PAX3, MITF, SOX10 and SNAI2. The heterozygous deletion mutation c.649_651delAGA in exon 7 of the MITF gene probably underlies the disease in the first family. It is expected that other genes may also underlie WS2.

  9. Nocardiopsis potens sp. nov., isolated from household waste.

    PubMed

    Yassin, A F; Spröer, C; Hupfer, H; Siering, C; Klenk, H-P

    2009-11-01

    The taxonomic position of an actinomycete, designated strain IMMIB L-21(T), was determined using a polyphasic taxonomic approach. The organism, which had phenotypic properties consistent with its classification in the genus Nocardiopsis, formed a distinct clade in the 16S rRNA gene sequence tree together with the type strain of Nocardiopsis composta, but was readily distinguished from this species using DNA-DNA relatedness and phenotypic data. The genotypic and phenotypic data show that the organism represents a novel species of the genus Nocardiopsis, for which the name Nocardiopsis potens sp. nov. is proposed. The type strain is IMMIB L-21(T) (=DSM 45234(T)=CCUG 56587(T)).

  10. Integrating Colon Cancer Microarray Data: Associating Locus-Specific Methylation Groups to Gene Expression-Based Classifications.

    PubMed

    Barat, Ana; Ruskin, Heather J; Byrne, Annette T; Prehn, Jochen H M

    2015-11-23

    Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC) and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like) finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype.

  11. Integrating Colon Cancer Microarray Data: Associating Locus-Specific Methylation Groups to Gene Expression-Based Classifications

    PubMed Central

    Barat, Ana; Ruskin, Heather J.; Byrne, Annette T.; Prehn, Jochen H. M.

    2015-01-01

    Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC) and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like) finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype. PMID:27600244

  12. Sequence variations in RepMP2/3 and RepMP4 elements reveal intragenomic homologous DNA recombination events in Mycoplasma pneumoniae.

    PubMed

    Spuesens, Emiel B M; Oduber, Minoushka; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis

    2009-07-01

    The gene encoding major adhesin protein P1 of Mycoplasma pneumoniae, MPN141, contains two DNA sequence stretches, designated RepMP2/3 and RepMP4, which display variation among strains. This variation allows strains to be differentiated into two major P1 genotypes (1 and 2) and several variants. Interestingly, multiple versions of the RepMP2/3 and RepMP4 elements exist at other sites within the bacterial genome. Because these versions are closely related in sequence, but not identical, it has been hypothesized that they have the capacity to recombine with their counterparts within MPN141, and thereby serve as a source of sequence variation of the P1 protein. In order to determine the variation within the RepMP2/3 and RepMP4 elements, both within the bacterial genome and among strains, we analysed the DNA sequences of all RepMP2/3 and RepMP4 elements within the genomes of 23 M. pneumoniae strains. Our data demonstrate that: (i) recombination is likely to have occurred between two RepMP2/3 elements in four of the strains, and (ii) all previously described P1 genotypes can be explained by inter-RepMP recombination events. Moreover, the difference between the two major P1 genotypes was reflected in all RepMP elements, such that subtype 1 and 2 strains can be differentiated on the basis of sequence variation in each RepMP element. This implies that subtype 1 and subtype 2 strains represent evolutionarily diverged strain lineages. Finally, a classification scheme is proposed in which the P1 genotype of M. pneumoniae isolates can be described in a sequence-based, universal fashion.

  13. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2017-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  14. Identification of Biomolecular Building Blocks by Recognition Tunneling: Stride towards Nanopore Sequencing of Biomolecules

    NASA Astrophysics Data System (ADS)

    Sen, Suman

    DNA, RNA and Protein are three pivotal biomolecules in human and other organisms, playing decisive roles in functionality, appearance, diseases development and other physiological phenomena. Hence, sequencing of these biomolecules acquires the prime interest in the scientific community. Single molecular identification of their building blocks can be done by a technique called Recognition Tunneling (RT) based on Scanning Tunneling Microscope (STM). A single layer of specially designed recognition molecule is attached to the STM electrodes, which trap the targeted molecules (DNA nucleoside monophosphates, RNA nucleoside monophosphates or amino acids) inside the STM nanogap. Depending on their different binding interactions with the recognition molecules, the analyte molecules generate stochastic signal trains accommodating their "electronic fingerprints". Signal features are used to detect the molecules using a machine learning algorithm and different molecules can be identified with significantly high accuracy. This, in turn, paves the way for rapid, economical nanopore sequencing platform, overcoming the drawbacks of Next Generation Sequencing (NGS) techniques. To read DNA nucleotides with high accuracy in an STM tunnel junction a series of nitrogen-based heterocycles were designed and examined to check their capabilities to interact with naturally occurring DNA nucleotides by hydrogen bonding in the tunnel junction. These recognition molecules are Benzimidazole, Imidazole, Triazole and Pyrrole. Benzimidazole proved to be best among them showing DNA nucleotide classification accuracy close to 99%. Also, Imidazole reader can read an abasic monophosphate (AP), a product from depurination or depyrimidination that occurs 10,000 times per human cell per day. In another study, I have investigated a new universal reader, 1-(2-mercaptoethyl)pyrene (Pyrene reader) based on stacking interactions, which should be more specific to the canonical DNA nucleosides. In addition, Pyrene reader showed higher DNA base-calling accuracy compare to Imidazole reader, the workhorse in our previous projects. In my other projects, various amino acids and RNA nucleoside monophosphates were also classified with significantly high accuracy using RT. Twenty naturally occurring amino acids and various RNA nucleosides (four canonical and two modified) were successfully identified. Thus, we envision nanopore sequencing biomolecules using Recognition Tunneling (RT) that should provide comprehensive betterment over current technologies in terms of time, chemical and instrumental cost and capability of de novo sequencing.

  15. The first determination of Trichuris sp. from roe deer by amplification and sequenation of the ITS1-5.8S-ITS2 segment of ribosomal DNA.

    PubMed

    Salaba, O; Rylková, K; Vadlejch, J; Petrtýl, M; Scháňková, S; Brožová, A; Jankovská, I; Jebavý, L; Langrová, I

    2013-03-01

    Trichuris nematodes were isolated from roe deer (Capreolus capreolus). At first, nematodes were determined using morphological and biometrical methods. Subsequently genomic DNA was isolated and the ITS1-5.8S-ITS2 segment from ribosomal DNA (RNA) was amplified and sequenced using PCR techniques. With u sing morphological and biometrical methods, female nematodes were identified as Trichuris globulosa, and the only male was identified as Trichuris ovis. The females were classified into four morphotypes. However, analysis of the internal transcribed spacers (ITS1-5.8S-ITS2) of specimens did not confirm this classification. Moreover, the female individuals morphologically determined as T. globulosa were molecularly identified as Trichuris discolor. In the case of the only male molecular analysis match the result of the molecular identification. Furthermore, a comparative phylogenetic study was carried out with the ITS1 and ITS2 sequences of the Trichuris species from various hosts. A comparison of biometric information from T. discolor individuals from this study was also conducted.

  16. PipeOnline 2.0: automated EST processing and functional data sorting.

    PubMed

    Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

    2002-11-01

    Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.

  17. DNA methylation analysis of phenotype specific stratified Indian population.

    PubMed

    Rotti, Harish; Mallya, Sandeep; Kabekkodu, Shama Prasada; Chakrabarty, Sanjiban; Bhale, Sameer; Bharadwaj, Ramachandra; Bhat, Balakrishna K; Dedge, Amrish P; Dhumal, Vikram Ram; Gangadharan, G G; Gopinath, Puthiya M; Govindaraj, Periyasamy; Joshi, Kalpana S; Kondaiah, Paturu; Nair, Sreekumaran; Nair, S N Venugopalan; Nayak, Jayakrishna; Prasanna, B V; Shintre, Pooja; Sule, Mayura; Thangaraj, Kumarasamy; Patwardhan, Bhushan; Valiathan, Marthanda Varma Sankaran; Satyamoorthy, Kapaettu

    2015-05-08

    DNA methylation and its perturbations are an established attribute to a wide spectrum of phenotypic variations and disease conditions. Indian traditional system practices personalized medicine through indigenous concept of distinctly descriptive physiological, psychological and anatomical features known as prakriti. Here we attempted to establish DNA methylation differences in these three prakriti phenotypes. Following structured and objective measurement of 3416 subjects, whole blood DNA of 147 healthy male individuals belonging to defined prakriti (Vata, Pitta and Kapha) between the age group of 20-30years were subjected to methylated DNA immunoprecipitation (MeDIP) and microarray analysis. After data analysis, prakriti specific signatures were validated through bisulfite DNA sequencing. Differentially methylated regions in CpG islands and shores were significantly enriched in promoters/UTRs and gene body regions. Phenotypes characterized by higher metabolism (Pitta prakriti) in individuals showed distinct promoter (34) and gene body methylation (204), followed by Vata prakriti which correlates to motion showed DNA methylation in 52 promoters and 139 CpG islands and finally individuals with structural attributes (Kapha prakriti) with 23 and 19 promoters and CpG islands respectively. Bisulfite DNA sequencing of prakriti specific multiple CpG sites in promoters and 5'-UTR such as; LHX1 (Vata prakriti), SOX11 (Pitta prakriti) and CDH22 (Kapha prakriti) were validated. Kapha prakriti specific CDH22 5'-UTR CpG methylation was also found to be associated with higher body mass index (BMI). Differential DNA methylation signatures in three distinct prakriti phenotypes demonstrate the epigenetic basis of Indian traditional human classification which may have relevance to personalized medicine.

  18. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  19. Identification, characterization and description of Arcobacter faecis sp. nov., isolated from a human waste septic tank.

    PubMed

    Whiteduck-Léveillée, Kerri; Whiteduck-Léveillée, Jenni; Cloutier, Michel; Tambong, James T; Xu, Renlin; Topp, Edward; Arts, Michael T; Chao, Jerry; Adam, Zaky; Lévesque, C André; Lapen, David R; Villemur, Richard; Khan, Izhar U H

    2016-03-01

    A study on the taxonomic classification of Arcobacter species was performed on the cultures isolated from various fecal sources where an Arcobacter strain AF1078(T) from human waste septic tank near Ottawa, Ontario, Canada was characterized using a polyphasic approach. Genetic investigations including 16S rRNA, atpA, cpn60, gyrA, gyrB and rpoB gene sequences of strain AF1078(T) are unique in comparison with other arcobacters. Phylogenetic analysis based on the 16S rRNA gene sequence revealed that the strain is most closely related to Arcobacter lanthieri and Arcobacter cibarius. Analyses of atpA, cpn60, gyrA, gyrB and rpoB gene sequences suggested that strain AF1078(T) formed a phylogenetic lineage independent of other species in the genus. Whole-genome sequence, DNA-DNA hybridization, fatty acid profile and phenotypic analysis further supported the conclusion that strain AF1078(T) represents a novel Arcobacter species, for which the name Arcobacter faecis sp. nov. is proposed, with type strain AF1078(T) (=LMG 28519(T); CCUG 66484(T)). Crown Copyright © 2015. Published by Elsevier GmbH. All rights reserved.

  20. Streptococcus oriloxodontae sp. nov., isolated from the oral cavities of elephants.

    PubMed

    Shinozaki-Kuwahara, Noriko; Saito, Masanori; Hirasawa, Masatomo; Takada, Kazuko

    2014-11-01

    Two strains were isolated from oral cavity samples of healthy elephants. The isolates were Gram-positive, catalase-negative, coccus-shaped organisms that were tentatively identified as a streptococcal species based on the results of biochemical tests. Comparative 16S rRNA gene sequence analysis suggested classification of these organisms in the genus Streptococcus with Streptococcus criceti ATCC 19642(T) and Streptococcus orisuis NUM 1001(T) as their closest phylogenetic neighbours with 98.2 and 96.9% gene sequence similarity, respectively. When multi-locus sequence analysis using four housekeeping genes, groEL, rpoB, gyrB and sodA, was carried out, similarity of concatenated sequences of the four housekeeping genes from the new isolates and Streptococcus mutans was 89.7%. DNA-DNA hybridization experiments suggested that the new isolates were distinct from S. criceti and other species of the genus Streptococcus. On the basis of genotypic and phenotypic differences, it is proposed that the novel isolates are classified in the genus Streptococcus as representatives of Streptococcus oriloxodontae sp. nov. The type strain of S. oriloxodontae is NUM 2101(T) ( =JCM 19285(T) =DSM 27377(T)). © 2014 IUMS.

  1. Heterogeneous data fusion for brain tumor classification.

    PubMed

    Metsis, Vangelis; Huang, Heng; Andronesi, Ovidiu C; Makedon, Fillia; Tzika, Aria

    2012-10-01

    Current research in biomedical informatics involves analysis of multiple heterogeneous data sets. This includes patient demographics, clinical and pathology data, treatment history, patient outcomes as well as gene expression, DNA sequences and other information sources such as gene ontology. Analysis of these data sets could lead to better disease diagnosis, prognosis, treatment and drug discovery. In this report, we present a novel machine learning framework for brain tumor classification based on heterogeneous data fusion of metabolic and molecular datasets, including state-of-the-art high-resolution magic angle spinning (HRMAS) proton (1H) magnetic resonance spectroscopy and gene transcriptome profiling, obtained from intact brain tumor biopsies. Our experimental results show that our novel framework outperforms any analysis using individual dataset.

  2. Generation and Analysis of Expressed Sequence Tags (ESTs) from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

    PubMed Central

    Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu

    2014-01-01

    Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361

  3. Cluster analysis of S. Cerevisiae nucleosome binding sites

    NASA Astrophysics Data System (ADS)

    Suvorova, Y.; Korotkov, E.

    2017-12-01

    It is well known that major part of a eukaryotic genome is wrapped around histone proteins forming nucleosomes. It was also demonstrated that the DNA sequence itself is playing an important role in the nucleosome positioning process. In this work, a cluster analysis of 67 517 nucleosome binding sites from the S. Cerevisiae genome was carried out. The classification method is based on the self-adjusting dinucleotides position weight matrix. As a result, 135 significant clusters were discovered that contain 43225 sequences (which constitutes 64% of the initial set). The meaning of the found classes is discussed, as well as the possibility of the further usage.

  4. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing.

    PubMed

    Euskirchen, Philipp; Bielle, Franck; Labreche, Karim; Kloosterman, Wigard P; Rosenberg, Shai; Daniau, Mailys; Schmitt, Charlotte; Masliah-Planchon, Julien; Bourdeaut, Franck; Dehais, Caroline; Marie, Yannick; Delattre, Jean-Yves; Idbaih, Ahmed

    2017-11-01

    Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis, and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous system tumors, the current World Health Organization classification explicitly demands molecular testing, e.g., for 1p/19q-codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A, and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within 6 h and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas, and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2, and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource-restricted settings.

  5. Complete mitochondrial DNA sequence of the European flat oyster Ostrea edulis confirms Ostreidae classification.

    PubMed

    Danic-Tchaleu, Gwenaelle; Heurtebise, Serge; Morga, Benjamin; Lapègue, Sylvie

    2011-10-12

    Because of its typical architecture, inheritance and small size, mitochondrial (mt) DNA is widely used for phylogenetic studies. Gene order is generally conserved in most taxa although some groups show considerable variation. This is particularly true in the phylum Mollusca, especially in the Bivalvia. During the last few years, there have been significant increases in the number of complete mitochondrial sequences available. For bivalves, 35 complete mitochondrial genomes are now available in GenBank, a number that has more than doubled in the last three years, representing 6 families and 23 genera. In the current study, we determined the complete mtDNA sequence of O. edulis, the European flat oyster. We present an analysis of features of its gene content and genome organization in comparison with other Ostrea, Saccostrea and Crassostrea species. The Ostrea edulis mt genome is 16 320 bp in length and codes for 37 genes (12 protein-coding genes, 2 rRNAs and 23 tRNAs) on the same strand. As in other Ostreidae, O. edulis mt genome contains a split of the rrnL gene and a duplication of trnM. The tRNA gene set of O. edulis, Ostrea denselamellosa and Crassostrea virginica are identical in having 23 tRNA genes, in contrast to Asian oysters, which have 25 tRNA genes (except for C. ariakensis with 24). O. edulis and O. denselamellosa share the same gene order, but differ from other Ostreidae and are closer to Crassostrea than to Saccostrea. Phylogenetic analyses reinforce the taxonomic classification of the 3 families Ostreidae, Mytilidae and Pectinidae. Within the Ostreidae family the results also reveal a closer relationship between Ostrea and Saccostrea than between Ostrea and Crassostrea. Ostrea edulis mitogenomic analyses show a high level of conservation within the genus Ostrea, whereas they show a high level of variation within the Ostreidae family. These features provide useful information for further evolutionary analysis of oyster mitogenomes.

  6. Complete mitochondrial DNA sequence of the European flat oyster Ostrea edulis confirms Ostreidae classification

    PubMed Central

    2011-01-01

    Background Because of its typical architecture, inheritance and small size, mitochondrial (mt) DNA is widely used for phylogenetic studies. Gene order is generally conserved in most taxa although some groups show considerable variation. This is particularly true in the phylum Mollusca, especially in the Bivalvia. During the last few years, there have been significant increases in the number of complete mitochondrial sequences available. For bivalves, 35 complete mitochondrial genomes are now available in GenBank, a number that has more than doubled in the last three years, representing 6 families and 23 genera. In the current study, we determined the complete mtDNA sequence of O. edulis, the European flat oyster. We present an analysis of features of its gene content and genome organization in comparison with other Ostrea, Saccostrea and Crassostrea species. Results The Ostrea edulis mt genome is 16 320 bp in length and codes for 37 genes (12 protein-coding genes, 2 rRNAs and 23 tRNAs) on the same strand. As in other Ostreidae, O. edulis mt genome contains a split of the rrnL gene and a duplication of trnM. The tRNA gene set of O. edulis, Ostrea denselamellosa and Crassostrea virginica are identical in having 23 tRNA genes, in contrast to Asian oysters, which have 25 tRNA genes (except for C. ariakensis with 24). O. edulis and O. denselamellosa share the same gene order, but differ from other Ostreidae and are closer to Crassostrea than to Saccostrea. Phylogenetic analyses reinforce the taxonomic classification of the 3 families Ostreidae, Mytilidae and Pectinidae. Within the Ostreidae family the results also reveal a closer relationship between Ostrea and Saccostrea than between Ostrea and Crassostrea. Conclusions Ostrea edulis mitogenomic analyses show a high level of conservation within the genus Ostrea, whereas they show a high level of variation within the Ostreidae family. These features provide useful information for further evolutionary analysis of oyster mitogenomes. PMID:21989403

  7. Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens

    PubMed Central

    Reynolds, Sheila M.; Bilmes, Jeff A.; Noble, William Stafford

    2010-01-01

    DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence—301 base pairs, centered at the position to be scored—with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the bulk of the remaining nucleosomes follow a statistical positioning model. PMID:20628623

  8. [Discussion on the botanical origin of Isatidis radix and Isatidis folium based on DNA barcoding].

    PubMed

    Sun, Zhi-Ying; Pang, Xiao-Hui

    2013-12-01

    This paper aimed to investigate the botanical origins of Isatidis Radix and Isatidis Folium, and clarify the confusion of its classification. The second internal transcribed spacer (ITS2) of ribosomal DNA, the chloroplast matK gene of 22 samples from some major production areas were amplified and sequenced. Sequence assembly and consensus sequence generation were performed using the CodonCode Aligner. Phylogenetic study was performed using MEGA 4.0 software in accordance with the Kimura 2-Parameter (K2P) model, and the phylogenetic tree was constructed using the neighbor-joining methods. The results showed that the length of ITS2 sequence of the botanical origins of Isatidis Radix and Isatidis Folium was 191 bp. The sequence showed that some samples had several SNP sites, and some samples had heterozygosis sites. In the NJ tree, based on ITS2 sequence, the studied samples were separated into two groups, and one of them was gathered with Isatis tinctoria L. The studied samples also were divided into two groups obviously based on the chloroplast matK gene. In conclusion, our results support that the botanical origins of Isatidis Radix and Isatidis Folium are Isatis indigotica Fortune, and Isatis indigotica and Isatis tinctoria are two distinct species. This study doesn't support the opinion about the combination of these two species in Flora of China.

  9. Feature selection using a one dimensional naïve Bayes' classifier increases the accuracy of support vector machine classification of CDR3 repertoires.

    PubMed

    Cinelli, Mattia; Sun, Yuxin; Best, Katharine; Heather, James M; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny

    2017-04-01

    Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund's adjuvant (CFA) or CFA alone. The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund's Adjuvant. The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893 . The Decombinator package is available at github.com/innate2adaptive/Decombinator . The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html . b.chain@ucl.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  10. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  11. Phylogeny of triatomine vectors of Trypanosoma cruzi suggested by mitochondrial DNA sequences.

    PubMed

    Sainz, Andrés C; Mauro, Laura V; Moriyama, Etsuko N; García, Beatriz A

    2004-07-01

    The subfamily Triatominae (Hemiptera: Reduviidae) comprises hematophagous insects, most of which are actual or potential vectors of Trypanosoma cruzi, the protozoan agent of Chagas' disease (American trypanosomiasis). DNA sequence comparisons of mitochondrial DNA (mtDNA) genes were used to infer phylogenetic relationships among 32 species of the subfamily Triatominae, 26 belonging to the genus Triatoma and six species of different genera. We analyzed mtDNA fragments of the 12S and 16S ribosomal RNA genes (totaling 848-851 bp) from each of the 32 species, as well as of the cytochrome oxidase I (COI, 1447 bp) gene from nine. The phylogenetic analyses unambiguously supported several clusters within the genus Triatoma. In the morphological classification, T. costalimai was placed tentatively within the infestans complex while T. guazu was not included in any Triatoma complex. The placement of these species in the molecular phylogeny indicated that both belong to the infestans complex. We confirmed with a strong support the inclusion of T. circummaculata, a member of a different complex based on morphology, within the infestans complex. On the other hand, the present phylogenetics analysis did not support the monophyly of the infestans complex species as it was suggested in our previous studies. While no strong inference of polyphyly of the genus Triatoma was provided by the bootstrap analyses, the other species belonging to Triatomini analyzed could not be distinguished from the species of Triatoma.

  12. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

    PubMed Central

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2018-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980

  13. Developing cDNA Libraries of Receptors Involved in the Recruitment of the Biofouling Tubeworm Hydroides elegans

    DTIC Science & Technology

    2014-06-12

    Transcriptome, Hydroides elegans, Next Generation Sequencing, Illumina HiSeq, PacBio SMRT, Biofilm , Metamorphosis 16. SECURITY CLASSIFICATION OF: a...to a bacterial cue from a bacterial biofilm . Recently, this cue has been identified to be a phage-tail like bacteriocin produced by the bacterium...submitted to the Huntsman Cancer Institute at the University of Utah and the subsequent isolation of mRNA was used for Illumina HiSeq 101 paired end

  14. Three-gene based phylogeny of the Urostyloidea (Protista, Ciliophora, Hypotricha), with notes on classification of some core taxa.

    PubMed

    Huang, Jie; Chen, Zigui; Song, Weibo; Berger, Helmut

    2014-01-01

    Classifications of the Urostyloidea were mainly based on morphology and morphogenesis. Since molecular phylogeny largely focused on limited sampling using mostly the one-gene information, the incongruence between morphological data and gene sequences have risen. In this work, the three-gene data (SSU-rDNA, ITS1-5.8S-ITS2 and LSU-rDNA) comprising 12 genera in the "core urostyloids" are sequenced, and the phylogenies based on these different markers are compared using maximum-likelihood and Bayesian algorithms and tested by unconstrained and constrained analyses. The molecular phylogeny supports the following conclusions: (1) the monophyly of the core group of Urostyloidea is well supported while the whole Urostyloidea is not monophyletic; (2) Thigmokeronopsis and Apokeronopsis are clearly separated from the pseudokeronopsids in analyses of all three gene markers, supporting their exclusion from the Pseudokeronopsidae and the inclusion in the Urostylidae; (3) Diaxonella and Apobakuella should be assigned to the Urostylidae; (4) Bergeriella, Monocoronella and Neourostylopsis flavicana share a most recent common ancestor; (5) all molecular trees support the transfer of Metaurostylopsis flavicana to the recently proposed genus Neourostylopsis; (6) all molecular phylogenies fail to separate the morphologically well-defined genera Uroleptopsis and Pseudokeronopsis; and (7) Arcuseries gen. nov. containing three distinctly deviating Anteholosticha species is established. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  15. The Cervical Microbiome over 7 Years and a Comparison of Methodologies for Its Characterization

    PubMed Central

    Smith, Benjamin C.; McAndrew, Thomas; Chen, Zigui; Harari, Ariana; Barris, David M.; Viswanathan, Shankar; Rodriguez, Ana Cecilia; Castle, Phillip; Herrero, Rolando; Schiffman, Mark; Burk, Robert D.

    2012-01-01

    Background The rapidly expanding field of microbiome studies offers investigators a large choice of methods for each step in the process of determining the microorganisms in a sample. The human cervicovaginal microbiome affects female reproductive health, susceptibility to and natural history of many sexually transmitted infections, including human papillomavirus (HPV). At present, long-term behavior of the cervical microbiome in early sexual life is poorly understood. Methods The V6 and V6–V9 regions of the 16S ribosomal RNA gene were amplified from DNA isolated from exfoliated cervical cells. Specimens from 10 women participating in the Natural History Study of HPV in Guanacaste, Costa Rica were sampled successively over a period of 5–7 years. We sequenced amplicons using 3 different platforms (Sanger, Roche 454, and Illumina HiSeq 2000) and analyzed sequences using pipelines based on 3 different classification algorithms (usearch, RDP Classifier, and pplacer). Results Usearch and pplacer provided consistent microbiome classifications for all sequencing methods, whereas RDP Classifier deviated significantly when characterizing Illumina reads. Comparing across sequencing platforms indicated 7%–41% of the reads were reclassified, while comparing across software pipelines reclassified up to 32% of the reads. Variability in classification was shown not to be due to a difference in read lengths. Six cervical microbiome community types were observed and are characterized by a predominance of either G. vaginalis or Lactobacillus spp. Over the 5–7 year period, subjects displayed fluctuation between community types. A PERMANOVA analysis on pairwise Kantorovich-Rubinstein distances between the microbiota of all samples yielded an F-test ratio of 2.86 (p<0.01), indicating a significant difference comparing within and between subjects’ microbiota. Conclusions Amplification and sequencing methods affected the characterization of the microbiome more than classification algorithms. Pplacer and usearch performed consistently with all sequencing methods. The analyses identified 6 community types consistent with those previously reported. The long-term behavior of the cervical microbiome indicated that fluctuations were subject dependent. PMID:22792313

  16. Gordonia westfalica sp. nov., a novel rubber-degrading actinomycete.

    PubMed

    Linos, Alexandros; Berekaa, Mahmoud M; Steinbüchel, Alexander; Kim, Kwang Kyu; Sproer, Cathrin; Kroppenstedt, Reiner M

    2002-07-01

    A cis-1,4-polyisoprene-degrading bacterium (strain Kb2T) was isolated from foul water taken from the inside of a deteriorated automobile tyre found on a farmer's field in Westfalia, Germany. The strain was aerobic, Gram-positive, exhibited orange smooth and rough colonies on complex nutrient agar, produced elementary branching hyphae that fragmented into rod/coccus-like elements and showed chemotaxonomic markers which were consistent with its classification within the genus Gordonia, i.e. the presence of mesodiaminopimelic acid, arabinose and galactose in whole-cell hydrolysates (cell-wall chemotype IV), N-glycolylmuramic acid in the peptidoglycan wall, a fatty-acid pattern composed of unbranched saturated and monounsaturated fatty acids plus tuberculostearic acid, mycolic acids comprising 56-60 carbon atoms and MK-9(H2) as the only menaquinone. The 16S rDNA sequence of strain Kb2T was found to be most similar to the 16S rDNA sequences of the type strains of Gordonia alkanivorans (DSM 44369T) and Gordonia nitida (KCTC 0605BPT). However, DNA-DNA relatedness data showed that strain Kb2T ( =DSM 44215T NRRL B-24152T) could be distinguished from these two species and represented a new species within the genus Gordonia, for which the name Gordonia westfalica is proposed.

  17. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  18. Molecular identification and phylogenetic study of Demodex caprae.

    PubMed

    Zhao, Ya-E; Cheng, Juan; Hu, Li; Ma, Jun-Xian

    2014-10-01

    The DNA barcode has been widely used in species identification and phylogenetic analysis since 2003, but there have been no reports in Demodex. In this study, to obtain an appropriate DNA barcode for Demodex, molecular identification of Demodex caprae based on mitochondrial cox1 was conducted. Firstly, individual adults and eggs of D. caprae were obtained for genomic DNA (gDNA) extraction; Secondly, mitochondrial cox1 fragment was amplified, cloned, and sequenced; Thirdly, cox1 fragments of D. caprae were aligned with those of other Demodex retrieved from GenBank; Finally, the intra- and inter-specific divergences were computed and the phylogenetic trees were reconstructed to analyze phylogenetic relationship in Demodex. Results obtained from seven 429-bp fragments of D. caprae showed that sequence identities were above 99.1% among three adults and four eggs. The intraspecific divergences in D. caprae, Demodex folliculorum, Demodex brevis, and Demodex canis were 0.0-0.9, 0.5-0.9, 0.0-0.2, and 0.0-0.5%, respectively, while the interspecific divergences between D. caprae and D. folliculorum, D. canis, and D. brevis were 20.3-20.9, 21.8-23.0, and 25.0-25.3, respectively. The interspecific divergences were 10 times higher than intraspecific ones, indicating considerable barcoding gap. Furthermore, the phylogenetic trees showed that four Demodex species gathered separately, representing independent species; and Demodex folliculorum gathered with canine Demodex, D. caprae, and D. brevis in sequence. In conclusion, the selected 429-bp mitochondrial cox1 gene is an appropriate DNA barcode for molecular classification, identification, and phylogenetic analysis of Demodex. D. caprae is an independent species and D. folliculorum is closer to D. canis than to D. caprae or D. brevis.

  19. Molecular evolution of ependymin and the phylogenetic resolution of early divergences among euteleost fishes.

    PubMed

    Ortí, G; Meyer, A

    1996-04-01

    The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time this pattern should also be observed using other phylogenetic markers. Poor conservation of ependymin's primary structure hinders the design of efficient primers for PCR that could be used in wide-ranging fish systematic studies. However, alternative methods like PCR amplification from cDNA used here should provide promising comparative sequence data for the resolution of phylogenetic relationships among other basal lineages of teleost fishes.

  20. The Ancient Evolutionary History of Polyomaviruses

    PubMed Central

    Buck, Christopher B.; Van Doorslaer, Koenraad; Peretti, Alberto; Geoghegan, Eileen M.; Tisza, Michael J.; An, Ping; Katz, Joshua P.; Pipas, James M.; McBride, Alison A.; Camus, Alvin C.; McDermott, Alexa J.; Dill, Jennifer A.; Delwart, Eric; Ng, Terry F. F.; Farkas, Kata; Austin, Charlotte; Kraberger, Simona; Davison, William; Pastrana, Diana V.; Varsani, Arvind

    2016-01-01

    Polyomaviruses are a family of DNA tumor viruses that are known to infect mammals and birds. To investigate the deeper evolutionary history of the family, we used a combination of viral metagenomics, bioinformatics, and structural modeling approaches to identify and characterize polyomavirus sequences associated with fish and arthropods. Analyses drawing upon the divergent new sequences indicate that polyomaviruses have been gradually co-evolving with their animal hosts for at least half a billion years. Phylogenetic analyses of individual polyomavirus genes suggest that some modern polyomavirus species arose after ancient recombination events involving distantly related polyomavirus lineages. The improved evolutionary model provides a useful platform for developing a more accurate taxonomic classification system for the viral family Polyomaviridae. PMID:27093155

  1. Classification of viral zoonosis through receptor pattern analysis.

    PubMed

    Bae, Se-Eun; Son, Hyeon Seok

    2011-04-13

    Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environmental factors. Viruses can be categorized according to genotype (ssDNA, dsDNA, ssRNA and dsRNA viruses). Among them, the RNA viruses exhibit particularly high mutation rates and are especially problematic for this reason. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species. In this study, we sought to predict zoonotic propensity through the analysis of receptor characteristics. We hypothesized that the major barrier to interspecies virus transmission is that receptor sequences vary among species--in other words, that the specific amino acid sequence of the receptor determines the ability of the viral envelope protein to attach to the cell. We analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species and used a statistical discriminant analysis to predict the likelihood of transmission among species. This study is an attempt to predict zoonosis through simple computational analysis of receptor sequence differences. Our method may be useful in predicting the zoonotic potential of newly discovered viral strains.

  2. Pantoea allii sp. nov., isolated from onion plants and seed.

    PubMed

    Brady, Carrie L; Goszczynska, Teresa; Venter, Stephanus N; Cleenwerck, Ilse; De Vos, Paul; Gitaitis, Ronald D; Coutinho, Teresa A

    2011-04-01

    Eight yellow-pigmented, Gram-negative, rod-shaped, oxidase-negative, motile, facultatively anaerobic bacteria were isolated from onion seed in South Africa and from an onion plant exhibiting centre rot symptoms in the USA. The isolates were assigned to the genus Pantoea on the basis of phenotypic and biochemical tests. 16S rRNA gene sequence analysis and multilocus sequence analysis (MLSA), based on gyrB, rpoB, infB and atpD sequences, confirmed the allocation of the isolates to the genus Pantoea. MLSA further indicated that the isolates represented a novel species, which was phylogenetically most closely related to Pantoea ananatis and Pantoea stewartii. Amplified fragment length polymorphism analysis also placed the isolates into a cluster separate from P. ananatis and P. stewartii. Compared with type strains of species of the genus Pantoea that showed >97 % 16S rRNA gene sequence similarity with strain BD 390(T), the isolates exhibited 11-55 % whole-genome DNA-DNA relatedness, which confirmed the classification of the isolates in a novel species. The most useful phenotypic characteristics for the differentiation of the isolates from their closest phylogenetic neighbours are production of acid from amygdalin and utilization of adonitol and sorbitol. A novel species, Pantoea allii sp. nov., is proposed, with type strain BD 390(T) ( = LMG 24248(T)).

  3. Molecular Diagnostics of Arthroconidial Yeasts, Frequent Pulmonary Opportunists.

    PubMed

    Kaplan, Engin; Al-Hatmi, Abdullah M S; Ilkit, Macit; Gerrits van den Ende, A H G; Hagen, Ferry; Meis, Jacques F; de Hoog, G Sybren

    2018-01-01

    Magnusiomyces capitatus and Saprochaete clavata are members of the clade of arthroconidial yeasts that represent emerging opportunistic pulmonary pathogens in immunocompromised patients. Given that standard ribosomal DNA (rDNA) identification often provides confusing results, in this study, we analyzed 34 isolates with the goal of finding new genetic markers for classification using multilocus sequencing and amplified fragment length polymorphism (AFLP). The interspecific similarity obtained using rDNA markers (the internal transcribed spacer [ITS] and large subunit regions) was in the range of 96 to 99%, whereas that obtained using protein-coding loci ( Rbp2 , Act , and Tef1α ) was lower at 89.4 to 95.2%. Ultimately, Rbp2 was selected as the best marker for species distinction. On the basis of cloned ITS data, some strains proved to be misidentified in comparison with the identities obtained with phenotypic characters, protein sequences, and AFLP profiles, indicating that different copies of the ribosomal operon were present in a single species. Antifungal susceptibility testing revealed that voriconazole had the lowest MIC against M. capitatus , while amphotericin B had the lowest MIC against S. clavata Both species exhibited in vitro resistance to fluconazole and micafungin. Copyright © 2017 American Society for Microbiology.

  4. Cephalosporium maydis is a distinct species in the Gaeumannomyces-Harpophora species complex.

    PubMed

    Saleh, Amgad A; Leslie, John F

    2004-01-01

    Cephalosporium maydis is an important plant pathogen whose phylogenetic position relative to other fungi has not been established clearly. We compared strains of C. maydis, strains from several other plant-pathogenic Cephalosporium spp. and several possible relatives within the Gaeumannomyces-Harpophora species complex, to which C. maydis has been suggested to belong based on previous preliminary DNA sequence analyses. DNA sequences of the nuclear genes encoding the rDNA ITS region, β-tubulin, histone H3, and MAT-2 support the hypothesis that C. maydis is a distinct taxon within the Gaeumannomyces-Harpophora species complex. Based on amplified fragment length polymorphism (AFLP) profiles, C. maydis also is distinct from the other tested species of Cephalosporium, Phialophora sensu lato and members of Gaeumannomyces-Harpophora species complex, which supports its classification as Harpophora maydis. Oligonucleotide primers for H. maydis were developed that can be used in a PCR diagnostic protocol to rapidly and reliably detect and identify this pathogen. These diagnostic PCR primers will aid the detection of H. maydis in diseased maize because this fungus can be difficult to detect and isolate, and the movement of authentic cultures may be limited by quarantine restrictions.

  5. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Molecular Diagnostics of Gliomas Using Next Generation Sequencing of a Glioma-Tailored Gene Panel.

    PubMed

    Zacher, Angela; Kaulich, Kerstin; Stepanow, Stefanie; Wolter, Marietta; Köhrer, Karl; Felsberg, Jörg; Malzkorn, Bastian; Reifenberger, Guido

    2017-03-01

    Current classification of gliomas is based on histological criteria according to the World Health Organization (WHO) classification of tumors of the central nervous system. Over the past years, characteristic genetic profiles have been identified in various glioma types. These can refine tumor diagnostics and provide important prognostic and predictive information. We report on the establishment and validation of gene panel next generation sequencing (NGS) for the molecular diagnostics of gliomas. We designed a glioma-tailored gene panel covering 660 amplicons derived from 20 genes frequently aberrant in different glioma types. Sensitivity and specificity of glioma gene panel NGS for detection of DNA sequence variants and copy number changes were validated by single gene analyses. NGS-based mutation detection was optimized for application on formalin-fixed paraffin-embedded tissue specimens including small stereotactic biopsy samples. NGS data obtained in a retrospective analysis of 121 gliomas allowed for their molecular classification into distinct biological groups, including (i) isocitrate dehydrogenase gene (IDH) 1 or 2 mutant astrocytic gliomas with frequent α-thalassemia/mental retardation syndrome X-linked (ATRX) and tumor protein p53 (TP53) gene mutations, (ii) IDH mutant oligodendroglial tumors with 1p/19q codeletion, telomerase reverse transcriptase (TERT) promoter mutation and frequent Drosophila homolog of capicua (CIC) gene mutation, as well as (iii) IDH wildtype glioblastomas with frequent TERT promoter mutation, phosphatase and tensin homolog (PTEN) mutation and/or epidermal growth factor receptor (EGFR) amplification. Oligoastrocytic gliomas were genetically assigned to either of these groups. Our findings implicate gene panel NGS as a promising diagnostic technique that may facilitate integrated histological and molecular glioma classification. © 2016 International Society of Neuropathology.

  7. Estimating Diversity of Florida Keys Zooplankton Using New Environmental DNA Methods

    NASA Astrophysics Data System (ADS)

    Djurhuus, A.; Goldsmith, D. B.; Sawaya, N. A.; Breitbart, M.

    2016-02-01

    Zooplankton are of great importance in marine food webs, where they serve to link the phytoplankton and bacteria with higher trophic levels. Zooplankton are a diverse group containing molluscs, crustaceans, fish larvae and many other taxa. The sheer number of species and often minor morphological distinctions between species makes it challenging and exceptionally time consuming to identify the species composition of marine zooplankton samples. As a part of the Marine Biodiversity Observation Network (MBON) project, we have developed and groundtruthed an alternative, relatively time-efficient method for zooplankton identification using environmental DNA (eDNA). Samples were collected from Molasses reef, Looe Key, and Western Sambo along the Florida Keys from five bi-monthly cruises on board the RV Walton Smith. Samples were collected for environmental DNA (eDNA) by filtering 1 L of water on to a 0.22 µm filter and zooplankton samples were collected using nets with three mesh sizes (64μm, 200μm, and 500μm) to catch different size fractions. Half of zooplankton samples were fixed in 70% ethanol and half in 10% formalin, for DNA extraction and morphological identification, respectively. Individuals representing visually abundant taxa were picked into individual wells for PCR with universal 18S rRNA gene primers and subsequent sequencing to build a reference barcode database for zooplankton species commonly found in the study region. PCR and Illumina MiSeq next generation sequencing was applied to the eDNA extracted from the 0.22 μm filters and sequences were be compared to our local custom database as well as publicly available databases to determine zooplankton community composition. Finally, composition and diversity analyses were performed to compare results obtained with the new eDNA approach to standard morphological classification of zooplankton communities. Results show that the eDNA approach can enable the determination of zooplankton diversity through collection of a single water sample, which, when combined with bacterial and archaeal diversity analyses, will help us understand the coupling between different trophic levels and the drivers of plankton dynamics in the sub-tropical Florida Keys.

  8. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    PubMed

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.

  9. Concurrent speciation in the eastern woodland salamanders (Genus Plethodon):DNA sequences of the complete albumin nuclear and partialmitochondrial 12s genes

    USGS Publications Warehouse

    Highton, Richard; Hastings, Amy Picard; Palmer, Catherine; Watts, Richard; Hass, Carla A.; Culver, Melanie; Arnold, Stevan

    2012-01-01

    Salamanders of the North American plethodontid genus Plethodon are important model organisms in a variety of studies that depend on a phylogenetic framework (e.g., chemical communication, ecological competition, life histories, hybridization, and speciation), and consequently their systematics has been intensively investigated over several decades. Nevertheless, we lack a synthesis of relationships among the species. In the analyses reported here we use new DNA sequence data from the complete nuclear albumin gene (1818 bp) and the 12s mitochondrial gene (355 bp), as well as published data for four other genes (Wiens et al., 2006), up to a total of 6989 bp, to infer relationships. We relate these results to past systematic work based on morphology, allozymes, and DNA sequences. Although basal relationships show a strong consensus across studies, many terminal relationships remain in flux despite substantial sequencing and other molecular and morphological studies. This systematic instability appears to be a consequence of contemporaneous bursts of speciation in the late Miocene and Pliocene, yielding many closely related extant species in each of the four eastern species groups. Therefore we conclude that many relationships are likely to remain poorly resolved in the face of additional sequencing efforts. On the other hand, the current classification of the 45 eastern species into four species groups is supported. The Plethodon cinereus group (10 species) is the sister group to the clade comprising the other three groups, but these latter groups (Plethodon glutinosus [28 species], Plethodon welleri [5 species], and Plethodon wehrlei [2 species]) probably diverged from each other at approximately the same time.

  10. Heterogeneity of three molecular data partition phylogenies of mints related to M. x piperita (Mentha; Lamiaceae).

    PubMed

    Gobert, V; Moja, S; Taberlet, P; Wink, M

    2006-07-01

    Phylogenetic reconstructions with molecular tools are now widely used, thanks to advances in PCR and sequencing technologies. The choice of the molecular target still remains a problem because too few comparative data are available. This is particularly true for hybrid taxa, where differential introgression of genome parts leads to incongruity between data sets. We have studied the potential of three data partitions to reconstruct the phylogeny of mints related to M. x piperita. These included nuclear DNA (ITS), chloroplast DNA (non-coding regions trnL intron, intergenic spacers trnL-trnF, and psbA-trnH), and AFLP and ISSR, markers. The taxonomic sampling was composed of hybrids, diploid and polyploid genomes. Since the genealogy of cultivated mint hybrids is known, they represent a model group to compare the usefulness of various molecular markers for phylogeny inference. Incongruities between ITS, chloroplast DNA, and AFLP-ISSR phylogenetic trees were recorded, although DNA fingerprinting data were congruent with morphological classification. Evidence of chloroplast capture events was obtained for M. x piperita. Direct sequencing of ITS led to biased results because of the existence of pseudogenes. Sequencing of cloned ITS further failed to provide evidence of the existence of the two parental copy types for M. x piperita, a sterile hybrid that has had no opportunity for concerted evolution of ITS copies. AFLP-ISSR data clustered M. x piperita with the parent that had the largest genome. This study sheds light on differential of introgression of different genome regions in mint hybrids.

  11. Biodiversity among Lactobacillus helveticus Strains Isolated from Different Natural Whey Starter Cultures as Revealed by Classification Trees

    PubMed Central

    Gatti, Monica; Trivisano, Carlo; Fabrizi, Enrico; Neviani, Erasmo; Gardini, Fausto

    2004-01-01

    Lactobacillus helveticus is a homofermentative thermophilic lactic acid bacterium used extensively for manufacturing Swiss type and aged Italian cheese. In this study, the phenotypic and genotypic diversity of strains isolated from different natural dairy starter cultures used for Grana Padano, Parmigiano Reggiano, and Provolone cheeses was investigated by a classification tree technique. A data set was used that consists of 119 L. helveticus strains, each of which was studied for its physiological characters, as well as surface protein profiles and hybridization with a species-specific DNA probe. The methodology employed in this work allowed the strains to be grouped into terminal nodes without difficult and subjective interpretation. In particular, good discrimination was obtained between L. helveticus strains isolated, respectively, from Grana Padano and from Provolone natural whey starter cultures. The method used in this work allowed identification of the main characteristics that permit discrimination of biotypes. In order to understand what kind of genes could code for phenotypes of technological relevance, evidence that specific DNA sequences are present only in particular biotypes may be of great interest. PMID:14711641

  12. Phylogeny of the Celastraceae inferred from 26S nuclear ribosomal DNA, phytochrome B, rbcL, atpB, and morphology.

    PubMed

    Simmons, M P; Savolainen, V; Clevinger, C C; Archer, R H; Davis, J I

    2001-06-01

    Phylogenetic relationships within Celastraceae (spindle-tree family) were inferred from nucleotide sequence characters from the 5' end of 26S nuclear ribosomal DNA (including expansion segments D1-D3; 84 species sampled), phytochrome B (58 species), rbcL (31 species), atpB (23 species), and morphology (94 species). Among taxa of questionable affinity, Forsellesia is a member of Crossosomataceae, and Goupia is excluded from Celastraceae. However, Brexia, Canotia, Lepuropetalon, Parnassia, Siphonodon, and Stackhousiaceae are supported as members of Celastraceae. Gymnosporia and Tricerma are distinct from Maytenus, Cassine is supported as distinct from Elaeodendron, and Dicarpellum is distinct from Salacia. Catha, Maytenus, and Pristimera are not resolved as natural genera. Hippocrateaceae (including Plagiopteron and Lophopetalum) are a clade nested within a paraphyletic Celastraceae. These data also suggest that the Loesener's classification of Celastraceae sensu stricto and Hallé's classification of Hippocrateaceae are artificial. The diversification of the fruit and aril within Celastraceae appears to be complex, with multiple origins of most fruit and aril forms. Copyright 2001 Academic Press.

  13. A molecular phylogenetic investigation of bakuella, anteholosticha, and caudiholosticha (protista, ciliophora, hypotrichia) based on three-gene sequences.

    PubMed

    Lv, Zhao; Shao, Chen; Yi, Zhenzhen; Warren, Alan

    2015-01-01

    Traditionally classifications of the Urostyloida have been mainly based on morphology and morphogenesis. Recent molecular phylogenetic analyses have been largely based on single-gene data for a limited number of taxa. Consequently, incongruence has arisen between the morphological/morphogenetic and the molecular data. In this study, the three phylogenetic markers (SSU rDNA, ITS1-5.8S-ITS2 region, and LSU-rDNA) of three urostyloid genera represented by four species (Bakuella granulifera, Anteholosticha monilata, Caudiholosticha sylvatica, and C. tetracirra) were sequenced to investigate their phylogeny. The results show that: (1) all three genera should be regarded as the members of the order Urostyloida within the subclass Hypotrichia, as indicated by morphological characters; (2) phylogenetic analyses and sequence similarities both indicate that neither Anteholosticha nor Caudiholosticha are monophyletic and the systematic assignment of both genera awaits further evaluation; and (3) Bakuella has a closer relationship with Urostyla than with bakuellids (e.g. Apobakuella and Metaurostylopsis), suggesting Bakuella may belong to the family Urostylidae rather than the family Bakuellidae. © 2014 The Author(s) Journal of Eukaryotic Microbiology © 2014 International Society of Protistologists.

  14. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

  15. Self-Organizing Hidden Markov Model Map (SOHMMM).

    PubMed

    Ferles, Christos; Stafylopatis, Andreas

    2013-12-01

    A hybrid approach combining the Self-Organizing Map (SOM) and the Hidden Markov Model (HMM) is presented. The Self-Organizing Hidden Markov Model Map (SOHMMM) establishes a cross-section between the theoretic foundations and algorithmic realizations of its constituents. The respective architectures and learning methodologies are fused in an attempt to meet the increasing requirements imposed by the properties of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein chain molecules. The fusion and synergy of the SOM unsupervised training and the HMM dynamic programming algorithms bring forth a novel on-line gradient descent unsupervised learning algorithm, which is fully integrated into the SOHMMM. Since the SOHMMM carries out probabilistic sequence analysis with little or no prior knowledge, it can have a variety of applications in clustering, dimensionality reduction and visualization of large-scale sequence spaces, and also, in sequence discrimination, search and classification. Two series of experiments based on artificial sequence data and splice junction gene sequences demonstrate the SOHMMM's characteristics and capabilities. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Classification of "Pseudomonas azotocolligans" Anderson 1955, 132, in the genus Sphingomonas as Sphingomonas trueperi sp. nov.

    PubMed

    Kämpfer, P; Denner, E B; Meyer, S; Moore, E R; Busse, H J

    1997-04-01

    "Pseudomonas azotocolligans" ATCC 12417T (T = type strain), which was described as a diazotrophic bacterium, was reinvestigated to clarify its taxonomic position. 16S ribosomal DNA sequence comparisons demonstrated that this strain clusters phylogenetically with species of the genus Sphingomonas and represents a new species. The results of investigations of the fatty acid patterns, polar lipid profiles, and quinone system supported this placement. The substrate utilization profile and biochemical characteristics displayed no obvious similarity to the characteristics of any previously described species of the genus Sphingomonas. The new name Sphingomonas trueperi is proposed on the basis of these results and previously published data for the G + C content of the genomic DNA and the polyamine pattern.

  17. [Identification of original species of Mantidis Oötheca (Sangpiaoxiao) based on DNA barcoding].

    PubMed

    Wang, Xi; Hou, Fei-xia; Wang, Yi-xuan; Wang, Yu-xian; Li, Jun-de; Yuan, Yuan; Peng, Cheng; Guo, Jin-lin

    2015-10-01

    Both market research and literature reports both found that the ootheca of mantodea was all used as medicine. However, Chinese Pharmacopoeia only records the ootheca of three mantis species. The clinical use of ootheca unrecorded in Chinese Pharmacopoeia, will pose potential risks to drug safety. It's urgent to identify the origin of Mantidis Oötheca. The current researches about original animal in Mantidis Oötheca are based on morphology and unanimous. DNA barcoding fill gaps of the traditional morphological identification, which is widely used in animal classification studies. This study first use DNA barcoding to analyze genetic distance among different Mantidis Oötheca types, align COI sequences between mantis and Mantidis Oötheca and construct the phylogeny tree. The result confirmed that Tenodera sinensis and Hierodula patellifera were the origin insects of Tuanpiaoxiao and Heipiaoxiao, respectively, and Statilia maculate and Mantis religiosa were the origin insects of Changpiaoxiao.

  18. Teaching artificial intelligence to read electropherograms.

    PubMed

    Taylor, Duncan; Powers, David

    2016-11-01

    Electropherograms are produced in great numbers in forensic DNA laboratories as part of everyday criminal casework. Before the results of these electropherograms can be used they must be scrutinised by analysts to determine what the identified data tells us about the underlying DNA sequences and what is purely an artefact of the DNA profiling process. A technique that lends itself well to such a task of classification in the face of vast amounts of data is the use of artificial neural networks. These networks, inspired by the workings of the human brain, have been increasingly successful in analysing large datasets, performing medical diagnoses, identifying handwriting, playing games, or recognising images. In this work we demonstrate the use of an artificial neural network which we train to 'read' electropherograms and show that it can generalise to unseen profiles. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  19. Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

    PubMed Central

    Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

    2014-01-01

    Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.

  20. Assessment of fetal sex chromosome aneuploidy using directed cell-free DNA analysis.

    PubMed

    Nicolaides, Kypros H; Musci, Thomas J; Struble, Craig A; Syngelaki, Argyro; Gil, M M

    2014-01-01

    To examine the performance of chromosome-selective sequencing of cell-free (cf) DNA in maternal blood for assessment of fetal sex chromosome aneuploidies. This was a case-control study of 177 stored maternal plasma samples, obtained before fetal karyotyping at 11-13 weeks of gestation, from 59 singleton pregnancies with fetal sex chromosome aneuploidies (45,X, n = 49; 47,XXX, n = 6; 47,XXY, n = 1; 47,XYY, n = 3) and 118 with euploid fetuses (46,XY, n = 59; 46,XX, n = 59). Digital analysis of selected regions (DANSR™) on chromosomes 21, 18, 13, X and Y was performed and the fetal-fraction optimized risk of trisomy evaluation (FORTE™) algorithm was used to estimate the risk for non-disomic genotypes. Performance was calculated at a risk cut-off of 1:100. Analysis of cfDNA provided risk scores for 172 (97.2%) samples; 4 samples (45,X, n = 2; 46,XY, n = 1; 46,XX, n = 1) had an insufficient fetal cfDNA fraction for reliable testing and 1 case (47,XXX) failed laboratory quality control metrics. The classification was correct in 43 (91.5%) of 47 cases of 45,X, all 5 of 47,XXX, 1 of 47,XXY and 3 of 47,XYY. There were no false-positive results for monosomy X. Analysis of cfDNA by chromosome-selective sequencing can correctly classify fetal sex chromosome aneuploidy with reasonably high sensitivity. © 2013 S. Karger AG, Basel.

  1. DNA barcoding commercially important fish species of Turkey.

    PubMed

    Keskın, Emre; Atar, Hasan H

    2013-09-01

    DNA barcoding was used in the identification of 89 commercially important freshwater and marine fish species found in Turkish ichthyofauna. A total of 1765 DNA barcodes using a 654-bp-long fragment of the mitochondrial cytochrome c oxidase subunit I gene were generated for 89 commercially important freshwater and marine fish species found in Turkish ichthyofauna. These species belong to 70 genera, 40 families and 19 orders from class Actinopterygii, and all were associated with a distinct DNA barcode. Nine and 12 of the COI barcode clusters represent the first species records submitted to the BOLD and GenBank databases, respectively. All COI barcodes (except sequences of first species records) were matched with reference sequences of expected species, according to morphological identification. Average nucleotide frequencies of the data set were calculated as T = 29.7%, C = 28.2%, A = 23.6% and G = 18.6%. Average pairwise genetic distance among individuals were estimated as 0.32%, 9.62%, 17,90% and 22.40% for conspecific, congeneric, confamilial and within order, respectively. Kimura 2-parameter genetic distance values were found to increase with taxonomic level. For most of the species analysed in our data set, there is a barcoding gap, and an overlap in the barcoding gap exists for only two genera. Neighbour-joining trees were drawn based on DNA barcodes and all the specimens clustered in agreement with their taxonomic classification at species level. Results of this study supported DNA barcoding as an efficient molecular tool for a better monitoring, conservation and management of fisheries. © 2013 John Wiley & Sons Ltd.

  2. Psychrobacter glaciei sp. nov., isolated from the ice core of an Arctic glacier.

    PubMed

    Zeng, Yin-Xin; Yu, Yong; Liu, Yang; Li, Hui-Rong

    2016-04-01

    A Gram-stain-negative, non-motile, non-spore-forming, non-pigmented, oxidase- and catalase-positive bacterial strain, designated BIc20019T, was isolated from the ice core of Austre Lovénbreen in Ny-Ålesund, Svalbard. The temperature and NaCl ranges for growth were 4-34 °C (optimum, 25-29 °C) and 0-8% (w/v) (optimum, 2-4%). Analysis of the 16S rRNA gene sequence indicated that strain BIc20019T belonged to the genus Psychrobacter and was closely related to Psychrobacter arcticus 273-4T, Psychrobacter cryohalolentis K5T, 'Psychrobacter fjordensis' BSw21516B, Psychrobacter fozii LMG 21280T, Psychrobacter luti LMG 21276T and Pyschrobacter okhotskensis MD17T at greater than 99% similarity. Phylogenetic analysis based on gyrB gene sequences revealed highest similarity (93.6%) to P. okhotskensis MD17T. However, DNA hybridization experiments revealed a low level of DNA-DNA relatedness (<59%) between strain BIc20019T and its closest relatives. Strain BIc20019T contained ubiquinone-8 (Q-8) as the predominant respiratory quinone, and C18:1ω9c and summed feature 3 (C16:1ω7c and/or iso-C15:0 2-OH) as the major fatty acids. It had a DNA G+C content of 46.3 mol%. The polar lipid profile of strain BIc20019T was mainly composed of phosphatidylglycerol, phosphatidylethanolamine and diphosphatidylglycerol. Owing to the differences in phenotypic and chemotaxonomic characteristics, phylogenetic analysis based on 16S rRNA gene and gyrB gene sequences, and DNA-DNA relatedness data, the isolate merits classification within a novel species for which the name Psychrobacter glaciei sp. nov. is proposed. The type strain is BIc20019T (=KCTC 42280T = CCTCC AB 2014019T).

  3. Molecular Phylogeny of the Widely Distributed Marine Protists, Phaeodaria (Rhizaria, Cercozoa).

    PubMed

    Nakamura, Yasuhide; Imai, Ichiro; Yamaguchi, Atsushi; Tuji, Akihiro; Not, Fabrice; Suzuki, Noritoshi

    2015-07-01

    Phaeodarians are a group of widely distributed marine cercozoans. These plankton organisms can exhibit a large biomass in the environment and are supposed to play an important role in marine ecosystems and in material cycles in the ocean. Accurate knowledge of phaeodarian classification is thus necessary to better understand marine biology, however, phylogenetic information on Phaeodaria is limited. The present study analyzed 18S rDNA sequences encompassing all existing phaeodarian orders, to clarify their phylogenetic relationships and improve their taxonomic classification. The monophyly of Phaeodaria was confirmed and strongly supported by phylogenetic analysis with a larger data set than in previous studies. The phaeodarian clade contained 11 subclades which generally did not correspond to the families and orders of the current classification system. Two families (Challengeriidae and Aulosphaeridae) and two orders (Phaeogromida and Phaeocalpida) are possibly polyphyletic or paraphyletic, and consequently the classification needs to be revised at both the family and order levels by integrative taxonomy approaches. Two morphological criteria, 1) the scleracoma type and 2) its surface structure, could be useful markers at the family level. Copyright © 2015 Elsevier GmbH. All rights reserved.

  4. Molecular phylogeny of tribe Rhipsalideae (Cactaceae) and taxonomic implications for Schlumbergera and Hatiora.

    PubMed

    Calvente, Alice; Zappi, Daniela C; Forest, Félix; Lohmann, Lúcia G

    2011-03-01

    Tribe Rhipsalideae is composed of unusual epiphytic or lithophytic cacti that inhabit humid tropical and subtropical forests. Members of this tribe present a reduced vegetative body, a specialized adventitious root system, usually spineless areoles and flowers and fruits reduced in size. Despite the debate surrounding the classification of Rhipsalideae, no studies have ever attempted to reconstruct phylogenetic relationships among its members or to test the monophyly of its genera using DNA sequence data; all classifications formerly proposed for this tribe have only employed morphological data. In this study, we reconstruct the phylogeny of Rhipsalideae using plastid (trnQ-rps16, rpl32-trnL, psbA-trnH) and nuclear (ITS) markers to evaluate the classifications previously proposed for the group. We also examine morphological features traditionally used to delimit genera within Rhipsalideae in light of the resulting phylogenetic trees. In total new sequences for 35 species of Rhipsalideae were produced (out of 55; 63%). The molecular phylogeny obtained comprises four main clades supporting the recognition of genera Lepismium, Rhipsalis, Hatiora and Schlumbergera. The evidence gathered indicate that a broader genus Schlumbergera, including Hatiora subg. Rhipsalidopsis, should be recognized. Consistent morphological characters rather than homoplastic features are used in order to establish a more coherent and practical classification for the group. Nomenclatural changes and a key for the identification of the genera currently included in Rhipsalideae are provided. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Identification of fungi in shotgun metagenomics datasets

    PubMed Central

    Donovan, Paul D.; Gonzalez, Gabriel; Higgins, Desmond G.

    2018-01-01

    Metagenomics uses nucleic acid sequencing to characterize species diversity in different niches such as environmental biomes or the human microbiome. Most studies have used 16S rRNA amplicon sequencing to identify bacteria. However, the decreasing cost of sequencing has resulted in a gradual shift away from amplicon analyses and towards shotgun metagenomic sequencing. Shotgun metagenomic data can be used to identify a wide range of species, but have rarely been applied to fungal identification. Here, we develop a sequence classification pipeline, FindFungi, and use it to identify fungal sequences in public metagenome datasets. We focus primarily on animal metagenomes, especially those from pig and mouse microbiomes. We identified fungi in 39 of 70 datasets comprising 71 fungal species. At least 11 pathogenic species with zoonotic potential were identified, including Candida tropicalis. We identified Pseudogymnoascus species from 13 Antarctic soil samples initially analyzed for the presence of bacteria capable of degrading diesel oil. We also show that Candida tropicalis and Candida loboi are likely the same species. In addition, we identify several examples where contaminating DNA was erroneously included in fungal genome assemblies. PMID:29444186

  6. One fungus, one name promotes progressive plant pathology.

    PubMed

    Wingfield, Michael J; De Beer, Z Wilhelm; Slippers, Bernard; Wingfield, Brenda D; Groenewald, Johannes Z; Lombard, Lorenzo; Crous, Pedro W

    2012-08-01

    The robust and reliable identification of fungi underpins virtually every element of plant pathology, from disease diagnosis to studies of biology, management/control, quarantine and, even more recently, comparative genomics. Most plant diseases are caused by fungi, typically pleomorphic organisms, for which the taxonomy and, in particular, a dual nomenclature system have frustrated and confused practitioners of plant pathology. The emergence of DNA sequencing has revealed cryptic taxa and revolutionized our understanding of relationships in the fungi. The impacts on plant pathology at every level are already immense and will continue to grow rapidly as new DNA sequencing technologies continue to emerge. DNA sequence comparisons, used to resolve a dual nomenclature problem for the first time only 19 years ago, have made it possible to approach a natural classification for the fungi and to abandon the confusing dual nomenclature system. The journey to a one fungus, one name taxonomic reality has been long and arduous, but its time has come. This will inevitably have a positive impact on plant pathology, plant pathologists and future students of this hugely important discipline on which the world depends for food security and plant health in general. This contemporary review highlights the problems of a dual nomenclature, especially its impact on plant pathogenic fungi, and charts the road to a one fungus, one name system that is rapidly drawing near. © 2011 The Authors. Molecular Plant Pathology © 2011 BSPP and Blackwell Publishing Ltd.

  7. Erwinia gerundensis sp. nov., a cosmopolitan epiphyte originally isolated from pome fruit trees.

    PubMed

    Rezzonico, Fabio; Smits, Theo H M; Born, Yannick; Blom, Jochen; Frey, Jürg E; Goesmann, Alexander; Cleenwerck, Ilse; de Vos, Paul; Bonaterra, Anna; Duffy, Brion; Montesinos, Emilio

    2016-03-01

    A survey to obtain potential antagonists of pome fruit tree diseases yielded two yellow epiphytic bacterial isolates morphologically similar to Pantoea agglomerans , but showing no biocontrol activity. Whole-cell MALDI-TOF mass spectrometry and analysis of 16S rRNA gene and gyrB sequences suggested the possibility of a novel species with a phylogenetic position in either the genus Pantoea or the genus Erwinia . Multi-locus sequence analysis (MLSA) placed the two strains in the genus Erwinia and supported their classification as a novel species. The strains showed general phenotypic characteristics typical of this genus and results of DNA-DNA hybridizations confirmed that they represent a single novel species. Both strains showed a DNA G+C content, as determined by HPLC, of 54.5 mol% and could be discriminated from phylogenetically related species of the genus Erwinia by their ability to utilize potassium gluconate, potassium 2-ketogluconate, maltose, melibiose and raffinose. Whole-genome sequencing of strain EM595 T revealed the presence of a chromosomal carotenoid biosynthesis gene cluster similar to those found in species of the genera Cronobacter and Pantoea that explains the pigmentation of the strain, which is atypical for the genus Erwinia . Additional strains belonging to the same species were recovered from different plant hosts in three different continents, revealing the cosmopolitan nature of this epiphyte. The name Erwinia gerundensis sp. nov. is proposed, with EM595 T ( = LMG 28990 T  = CCOS 903 T ) as the designated type strain.

  8. Xylella fastidiosa: Host Range and Advance in Molecular Identification Techniques

    PubMed Central

    Baldi, Paolo; La Porta, Nicola

    2017-01-01

    In the never ending struggle against plant pathogenic bacteria, a major goal is the early identification and classification of infecting microorganisms. Xylella fastidiosa, a Gram-negative bacterium belonging to the family Xanthmonadaceae, is no exception as this pathogen showed a broad range of vectors and host plants, many of which may carry the pathogen for a long time without showing any symptom. Till the last years, most of the diseases caused by X. fastidiosa have been reported from North and South America, but recently a widespread infection of olive quick decline syndrome caused by this fastidious pathogen appeared in Apulia (south-eastern Italy), and several cases of X. fastidiosa infection have been reported in other European Countries. At least five different subspecies of X. fastidiosa have been reported and classified: fastidiosa, multiplex, pauca, sandyi, and tashke. A sixth subspecies (morus) has been recently proposed. Therefore, it is vital to develop fast and reliable methods that allow the pathogen detection during the very early stages of infection, in order to prevent further spreading of this dangerous bacterium. To this purpose, the classical immunological methods such as ELISA and immunofluorescence are not always sensitive enough. However, PCR-based methods exploiting specific primers for the amplification of target regions of genomic DNA have been developed and are becoming a powerful tool for the detection and identification of many species of bacteria. The aim of this review is to illustrate the application of the most commonly used PCR approaches to X. fastidiosa study, ranging from classical PCR, to several PCR-based detection methods: random amplified polymorphic DNA (RAPD), quantitative real-time PCR (qRT-PCR), nested-PCR (N-PCR), immunocapture PCR (IC-PCR), short sequence repeats (SSRs, also called VNTR), single nucleotide polymorphisms (SNPs) and multilocus sequence typing (MLST). Amplification and sequence analysis of specific targets is also mentioned. The fast progresses achieved during the last years in the DNA-based classification of this pathogen are described and discussed and specific primers designed for the different methods are listed, in order to provide a concise and useful tool to all the researchers working in the field. PMID:28642764

  9. Molecular phylogeny and systematics of the banana family (Musaceae) inferred from multiple nuclear and chloroplast DNA fragments, with a special reference to the genus Musa.

    PubMed

    Li, Lin-Feng; Häkkinen, Markku; Yuan, Yong-Ming; Hao, Gang; Ge, Xue-Jun

    2010-10-01

    Musaceae is a small paleotropical family. Three genera have been recognised within this family although the generic delimitations remain controversial. Most species of the family (around 65 species) have been placed under the genus Musa and its infrageneric classification has long been disputed. In this study, we obtained nuclear ribosomal ITS and chloroplast (atpB-rbcL, rps16, and trnL-F) DNA sequences of 36 species (42 accessions of ingroups representing three genera) together with 10 accessions of ingroups retrieved from GenBank database and 4 accessions of outgroups, to construct the phylogeny of the family, with a special reference to the infrageneric classification of the genus Musa. Our phylogenetic analyses elaborated previous results in supporting the monophyly of the family and suggested that Musella and Ensete may be congeneric or at least closely related, but refuted the previous infrageneric classification of Musa. None of the five sections of Musa previously defined based on morphology was recovered as monophyletic group in the molecular phylogeny. Two infrageneric clades were identified, which corresponded well to the basic chromosome numbers of x=11 and 10/9/7, respectively: the former clade comprises species from the sections Musa and Rhodochlamys while the latter contains sections of Callimusa, Australimusa, and Ingentimusa. Copyright 2010 Elsevier Inc. All rights reserved.

  10. Chromatin accessibility prediction via a hybrid deep convolutional neural network.

    PubMed

    Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui

    2018-03-01

    A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires

    PubMed Central

    Cinelli, Mattia; Sun, , Yuxin; Best, Katharine; Heather, James M.; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny

    2017-01-01

    Abstract Motivation: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone. Results: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. Summary: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant. Availability and implementation: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html. Contact: b.chain@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073756

  12. Scar-less multi-part DNA assembly design automation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hillson, Nathan J.

    The present invention provides a method of a method of designing an implementation of a DNA assembly. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which to assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding flanking homology sequences to each of the DNA oligos. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which tomore » assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding optimized overhang sequences to each of the DNA oligos.« less

  13. Evidence for a Pneumocystis carinii Flo8-like transcription factor: insights into organism adhesion.

    PubMed

    Kottom, Theodore J; Limper, Andrew H

    2016-02-01

    Pneumocystis carinii (Pc) adhesion to alveolar epithelial cells is well established and is thought to be a prerequisite for the initiation of Pneumocystis pneumonia. Pc binding events occur in part through the major Pc surface glycoprotein Msg, as well as an integrin-like molecule termed PcInt1. Recent data from the Pc sequencing project also demonstrate DNA sequences homologous to other genes important in Candida spp. binding to mammalian host cells, as well as organism binding to polystyrene surfaces and in biofilm formation. One of these genes, flo8, a transcription factor needed for downstream cAMP/PKA-pathway-mediated activation of the major adhesion/flocculin Flo11 in yeast, was cloned from a Pc cDNA library utilizing a partial sequence available in the Pc genome database. A CHEF blot of Pc genomic DNA yielded a single band providing evidence this gene is present in the organism. BLASTP analysis of the predicted protein demonstrated 41 % homology to the Saccharomyces cerevisiae Flo8. Northern blotting demonstrated greatest expression at pH 6.0-8.0, pH comparable to reported fungal biofilm milieu. Western blot and immunoprecipitation assays of PcFlo8 protein in isolated cyst and tropic life forms confirmed the presence of the cognate protein in these Pc life forms. Heterologous expression of Pcflo8 cDNA in flo8Δ-deficient yeast strains demonstrated that the Pcflo8 was able to restore yeast binding to polystyrene and invasive growth of yeast flo8Δ cells. Furthermore, Pcflo8 promoted yeast binding to HEK293 human epithelial cells, strengthening its functional classification as a Flo8 transcription factor. Taken together, these data suggest that PcFlo8 is expressed by Pc and may exert activity in organism adhesion and biofilm formation.

  14. Evidence for a Pneumocystis carinii Flo8-like Transcription Factor: Insights into Organism Adhesion

    PubMed Central

    Kottom, Theodore J.; Limper, Andrew H.

    2015-01-01

    Pneumocystis carinii (Pc) adhesion to alveolar epithelial cells is well established and is thought to be a prerequisite for initiation of Pneumocystis pneumonia. Pc binding events occur in part through the major Pc surface glycoprotein Msg, as well as an integrin-like molecule termed PcInt1. Recent data from the Pc sequencing project also demonstrate DNA sequences homologous to other genes important in Candida spp. binding to mammalian host cells, as well as organism binding to polystyrene surfaces and in biofilm formation. One of these genes, flo8, a transcription factor needed for downstream cAMP/PKA-pathway-mediated activation of the major adhesin/flocculin Flo11 in yeast, was cloned from a Pc cDNA library utilizing a partial sequence available in the Pc genome database. A CHEF blot of Pc genomic DNA yielded a single band providing evidence this gene is present in the organism. BLASTP analysis of the predicted protein demonstrated 41% homology to the Saccharomyces cerevisiae Flo8. Northern blotting demonstrated greatest expression at pH 6.0–8.0, pH comparable to reported fungal biofilm milieu. Western blot and immunoprecipitation assays of PcFlo8 protein in isolated cyst and tropic life forms confirmed the presence of the cognate protein in these Pc life forms. Heterologous expression of Pcflo8 cDNA in flo8Δ (deficient) yeast strains demonstrated the Pcflo8 was able to restore yeast binding to polystyrene and invasive growth of yeast flo8Δ cells. Furthermore, Pcflo8 promoted yeast binding to HEK293 human epithelial cells, strengthening its functional classification as a Flo8 transcription factor. Taken together these data suggests that PcFlo8 is expressed by Pc and may exert activity in organism adhesion and biofilm formation. PMID:26215665

  15. Phylogenomic Analyses and Reclassification of Species within the Genus Tsukamurella: Insights to Species Definition in the Post-genomic Era.

    PubMed

    Teng, Jade L L; Tang, Ying; Huang, Yi; Guo, Feng-Biao; Wei, Wen; Chen, Jonathan H K; Wong, Samson S Y; Lau, Susanna K P; Woo, Patrick C Y

    2016-01-01

    Owing to the highly similar phenotypic profiles, protein spectra and 16S rRNA gene sequences observed between three pairs of Tsukamurella species (Tsukamurella pulmonis/Tsukamurella spongiae, Tsukamurella tyrosinosolvens/Tsukamurella carboxy-divorans, and Tsukamurella pseudospumae/Tsukamurella sunchonensis), we hypothesize that and the six Tsukamurella species may have been misclassified and that there may only be three Tsukamurella species. In this study, we characterized the type strains of these six Tsukamurella species by tradition DNA-DNA hybridization (DDH) and "digital DDH" after genome sequencing to determine their exact taxonomic positions. Traditional DDH showed 81.2 ± 0.6% to 99.7 ± 1.0% DNA-DNA relatedness between the two Tsukamurella species in each of the three pairs, which was above the threshold for same species designation. "Digital DDH" based on Genome-To-Genome Distance Calculator and Average Nucleotide Identity for the three pairs also showed similarity results in the range of 82.3-92.9 and 98.1-99.1%, respectively, in line with results of traditional DDH. Based on these evidence and according to Rules 23a and 42 of the Bacteriological Code, we propose that T. spongiae Olson et al. 2007, should be reclassified as a later heterotypic synonym of T. pulmonis Yassin et al. 1996, T. carboxydivorans Park et al. 2009, as a later heterotypic synonym of T. tyrosinosolvens Yassin et al. 1997, and T. sunchonensis Seong et al. 2008 as a later heterotypic synonym of T. pseudospumae Nam et al. 2004. With the advancement of genome sequencing technologies, classification of bacterial species can be readily achieved by "digital DDH" than traditional DDH.

  16. AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences

    PubMed Central

    Grau, Jan; Reschke, Maik; Erkes, Annett; Streubel, Jana; Morgan, Richard D.; Wilson, Geoffrey G.; Koebnik, Ralf; Boch, Jens

    2016-01-01

    Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present ‘AnnoTALE’, a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities. PMID:26876161

  17. Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

    PubMed Central

    Willerslev, Eske; Gilbert, M Thomas P; Binladen, Jonas; Ho, Simon YW; Campos, Paula F; Ratan, Aakrosh; Tomsho, Lynn P; da Fonseca, Rute R; Sher, Andrei; Kuznetsova, Tatanya V; Nowak-Kemp, Malgosia; Roth, Terri L; Miller, Webb; Schuster, Stephan C

    2009-01-01

    Background The scientific literature contains many examples where DNA sequence analyses have been used to provide definitive answers to phylogenetic problems that traditional (non-DNA based) approaches alone have failed to resolve. One notable example concerns the rhinoceroses, a group for which several contradictory phylogenies were proposed on the basis of morphology, then apparently resolved using mitochondrial DNA fragments. Results In this study we report the first complete mitochondrial genome sequences of the extinct ice-age woolly rhinoceros (Coelodonta antiquitatis), and the threatened Javan (Rhinoceros sondaicus), Sumatran (Dicerorhinus sumatrensis), and black (Diceros bicornis) rhinoceroses. In combination with the previously published mitochondrial genomes of the white (Ceratotherium simum) and Indian (Rhinoceros unicornis) rhinoceroses, this data set putatively enables reconstruction of the rhinoceros phylogeny. While the six species cluster into three strongly supported sister-pairings: (i) The black/white, (ii) the woolly/Sumatran, and (iii) the Javan/Indian, resolution of the higher-level relationships has no statistical support. The phylogenetic signal from individual genes is highly diffuse, with mixed topological support from different genes. Furthermore, the choice of outgroup (horse vs tapir) has considerable effect on reconstruction of the phylogeny. The lack of resolution is suggestive of a hard polytomy at the base of crown-group Rhinocerotidae, and this is supported by an investigation of the relative branch lengths. Conclusion Satisfactory resolution of the rhinoceros phylogeny may not be achievable without additional analyses of substantial amounts of nuclear DNA. This study provides a compelling demonstration that, in spite of substantial sequence length, there are significant limitations with single-locus phylogenetics. We expect further examples of this to appear as next-generation, large-scale sequencing of complete mitochondrial genomes becomes commonplace in evolutionary studies. "The human factor in classification is nowhere more evident than in dealing with this superfamily (Rhinocerotoidea)." G. G. Simpson (1945) PMID:19432984

  18. Nuclear and cpDNA sequences combined provide strong inference of higher phylogenetic relationships in the phlox family (Polemoniaceae).

    PubMed

    Johnson, Leigh A; Chan, Lauren M; Weese, Terri L; Busby, Lisa D; McMurry, Samuel

    2008-09-01

    Members of the phlox family (Polemoniaceae) serve as useful models for studying various evolutionary and biological processes. Despite its biological importance, no family-wide phylogenetic estimate based on multiple DNA regions with complete generic sampling is available. Here, we analyze one nuclear and five chloroplast DNA sequence regions (nuclear ITS, chloroplast matK, trnL intron plus trnL-trnF intergeneric spacer, and the trnS-trnG, trnD-trnT, and psbM-trnD intergenic spacers) using parsimony and Bayesian methods, as well as assessments of congruence and long branch attraction, to explore phylogenetic relationships among 84 ingroup species representing all currently recognized Polemoniaceae genera. Relationships inferred from the ITS and concatenated chloroplast regions are similar overall. A combined analysis provides strong support for the monophyly of Polemoniaceae and subfamilies Acanthogilioideae, Cobaeoideae, and Polemonioideae. Relationships among subfamilies, and thus for the precise root of Polemoniaceae, remain poorly supported. Within the largest subfamily, Polemonioideae, four clades corresponding to tribes Polemonieae, Phlocideae, Gilieae, and Loeselieae receive strong support. The monogeneric Polemonieae appears sister to Phlocideae. Relationships within Polemonieae, Phlocideae, and Gilieae are mostly consistent between analyses and data permutations. Many relationships within Loeselieae remain uncertain. Overall, inferred phylogenetic relationships support a higher-level classification for Polemoniaceae proposed in 2000.

  19. Persistent, widespread papilloma formation on the penis of a horse: a novel presentation of equine papillomavirus type 2 infection.

    PubMed

    Knight, Cameron G; Munday, John S; Rosa, Brielle V; Kiupel, Matti

    2011-12-01

    A 9-year-old gelding presented with approximately 100 papillomas that covered about 75% of the distal penis. Biopsy was performed, and histology showed evidence of viral cytopathic change and koilocytosis. Polymerase chain reaction using DNA extracted from biopsied tissue amplified equine papillomavirus type 2 (EcPV-2) DNA sequences. Sixteen months later, the horse was re-examined and the appearance of the papillomas was unchanged. Equine papillomavirus type 2 DNA sequences were again amplified from both biopsied tissue and swabs of the penis. Papillomavirus was localized to the lesions by immunohistochemistry and in situ hybridization. An examination 2 years after the initial presentation revealed no detectable change in the appearance of the penis. The large number of papillomas and their failure to regress over an extended period support a clinical classification of papillomatosis. To the authors' knowledge, this is the first report of papillomatosis of the equine penis. This novel clinical manifestation suggests that persistent EcPV-2 infection is possible in horses. As there is evidence that EcPV-2 may promote development of equine penile squamous cell carcinoma, understanding the natural history of EcPV-2 infections may be important in preventing equine penile neoplasia. © 2011 The Authors. Veterinary Dermatology. © 2011 ESVD and ACVD.

  20. Paenibacillus methanolicus sp. nov., a xylanolytic, methanol-utilizing bacterium isolated from the phyllosphere of bamboo (Pseudosasa japonica).

    PubMed

    Madhaiyan, Munusamy; Poonguzhali, Selvaraj; Saravanan, Venkatakrishnan Sivaraj; Pragatheswari, Dhandapani; Duraipandiyan, Veeramuthu; Al-Dhabi, Naif Abdullah; Santhanakrishnan, Palani

    2016-11-01

    Strain BL24T, isolated from bamboo phyllosphere collected in Coimbatore, India, was studied for taxonomic classification. Cells of the strain were aerobic, Gram-stain-positive, motile, catalase- and oxidase-positive rods and grew on media containing methanol. In 16S rRNA gene sequence analysis, strain BL24Tshowed the highest sequence similarities with Paenibacillus phyllosphaeraeKACC 11473T (97.8 %) and Paenibacillus sacheonensisSY01 (95.1 %). DNA-DNA hybridization with P. phyllosphaerae KACC 11473T, phylogenetically the most closely related species, was 21.6 %; this value showed that strain BL24Tbelonged to a different species. The cell-wall peptidoglycan was found to possess meso-diaminopimelic acid and the G+C content of genomic DNA was 52.1 mol %. It contained menaquinone (MK)-7 as the predominant respiratory quinone and the major cellular fatty acids are C16 : 0, anteiso-C15 : 0, iso-C16 : 0, and anteiso-C17 : 0. Based on the molecular and chemotaxonomic markers and physiological properties, strain BL24T (=NRRL B-51698T=CCM 7577T) is considered to represent a novel species of the genus Paenibacillus, for which the name Paenibacillusmethanolicusis proposed.

  1. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S

    2013-06-25

    A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.

  2. Molecular Evolution and Functional Diversification of Replication Protein A1 in Plants

    PubMed Central

    Aklilu, Behailu B.; Culligan, Kevin M.

    2016-01-01

    Replication protein A (RPA) is a heterotrimeric, single-stranded DNA binding complex required for eukaryotic DNA replication, repair, and recombination. RPA is composed of three subunits, RPA1, RPA2, and RPA3. In contrast to single RPA subunit genes generally found in animals and yeast, plants encode multiple paralogs of RPA subunits, suggesting subfunctionalization. Genetic analysis demonstrates that five Arabidopsis thaliana RPA1 paralogs (RPA1A to RPA1E) have unique and overlapping functions in DNA replication, repair, and meiosis. We hypothesize here that RPA1 subfunctionalities will be reflected in major structural and sequence differences among the paralogs. To address this, we analyzed amino acid and nucleotide sequences of RPA1 paralogs from 25 complete genomes representing a wide spectrum of plants and unicellular green algae. We find here that the plant RPA1 gene family is divided into three general groups termed RPA1A, RPA1B, and RPA1C, which likely arose from two progenitor groups in unicellular green algae. In the family Brassicaceae the RPA1B and RPA1C groups have further expanded to include two unique sub-functional paralogs RPA1D and RPA1E, respectively. In addition, RPA1 groups have unique domains, motifs, cis-elements, gene expression profiles, and pattern of conservation that are consistent with proposed functions in monocot and dicot species, including a novel C-terminal zinc-finger domain found only in plant RPA1C-like sequences. These results allow for improved prediction of RPA1 subunit functions in newly sequenced plant genomes, and potentially provide a unique molecular tool to improve classification of Brassicaceae species. PMID:26858742

  3. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA

    2011-01-18

    A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.

  4. Gluconacetobacter medellinensis sp. nov., cellulose- and non-cellulose-producing acetic acid bacteria isolated from vinegar.

    PubMed

    Castro, Cristina; Cleenwerck, Ilse; Trcek, Janja; Zuluaga, Robin; De Vos, Paul; Caro, Gloria; Aguirre, Ricardo; Putaux, Jean-Luc; Gañán, Piedad

    2013-03-01

    The phylogenetic position of a cellulose-producing acetic acid bacterium, strain ID13488, isolated from commercially available Colombian homemade fruit vinegar, was investigated. Analyses using nearly complete 16S rRNA gene sequences, nearly complete 16S-23S rRNA gene internal transcribed spacer (ITS) sequences, as well as concatenated partial sequences of the housekeeping genes dnaK, groEL and rpoB, allocated the micro-organism to the genus Gluconacetobacter, and more precisely to the Gluconacetobacter xylinus group. Moreover, the data suggested that the micro-organism belongs to a novel species in this genus, together with LMG 1693(T), a non-cellulose-producing strain isolated from vinegar by Kondo and previously classified as a strain of Gluconacetobacter xylinus. DNA-DNA hybridizations confirmed this finding, revealing a DNA-DNA relatedness value of 81 % between strains ID13488 and LMG 1693(T), and values <70 % between strain LMG 1693(T) and the type strains of the closest phylogenetic neighbours. Additionally, the classification of strains ID13488 and LMG 1693(T) into a single novel species was supported by amplified fragment length polymorphism (AFLP) and (GTG)5-PCR DNA fingerprinting data, as well as by phenotypic data. Strains ID13488 and LMG 1693(T) could be differentiated from closely related species of the genus Gluconacetobacter by their ability to produce 2- and 5-keto-d-gluconic acid from d-glucose, their ability to produce acid from sucrose, but not from 1-propanol, and their ability to grow on 3 % ethanol in the absence of acetic acid and on ethanol, d-ribose, d-xylose, sucrose, sorbitol, d-mannitol and d-gluconate as carbon sources. The DNA G+C content of strains ID13488 and LMG 1693(T) was 58.0 and 60.7 mol%, respectively. The major ubiquinone of LMG 1693(T) was Q-10. Taken together these data indicate that strains ID13488 and LMG 1693(T) represent a novel species of the genus Gluconacetobacter for which the name Gluconacetobacter medellinensis sp. nov. is proposed. The type strain is LMG 1693(T) ( = NBRC 3288(T) = Kondo 51(T)).

  5. Fourteen coprophilous species of Psathyrella identified in the Nordic countries using morphology and nuclear rDNA sequence data.

    PubMed

    Larsson, Ellen; Orstadius, Leif

    2008-10-01

    Psathyrella species growing on dung or occasionally on dung in the Nordic countries were studied using morphological characters and nu-rDNA sequence data and type collections were examined when available. Fourteen species capable of growing on dung were identified. Descriptions are given of all dung-inhabiting species and to a lesser extent of the species occasionally growing on dung. Three new species are described: Psathyrella fimiseda, P. merdicola, and P. scatophila. P. stercoraria is described as a new species in order to validate the name. A key to the coprophilous species in Europe including the species described by Peck & Smith from North America is provided. The phylogenetic analyses recovered four major supported clades within Psathyrellaceae corresponding to Parasola, Coprinopsis, Lacrymaria/Spadiceae pro parte, and Psathyrella. The status of Coprinellus was ambiguous. The current morphology-based infrageneric classification of Psathyrella was not supported by the phylogenetic analyses and a coprophilous habit has apparently evolved on multiple occasions. Three new combinations are proposed: Parasola conopilus, Coprinopsis marcescibilis, and Coprinopsis pannucioides.

  6. A Java-based tool for the design of classification microarrays.

    PubMed

    Meng, Da; Broschat, Shira L; Call, Douglas R

    2008-08-04

    Classification microarrays are used for purposes such as identifying strains of bacteria and determining genetic relationships to understand the epidemiology of an infectious disease. For these cases, mixed microarrays, which are composed of DNA from more than one organism, are more effective than conventional microarrays composed of DNA from a single organism. Selection of probes is a key factor in designing successful mixed microarrays because redundant sequences are inefficient and limited representation of diversity can restrict application of the microarray. We have developed a Java-based software tool, called PLASMID, for use in selecting the minimum set of probe sequences needed to classify different groups of plasmids or bacteria. The software program was successfully applied to several different sets of data. The utility of PLASMID was illustrated using existing mixed-plasmid microarray data as well as data from a virtual mixed-genome microarray constructed from different strains of Streptococcus. Moreover, use of data from expression microarray experiments demonstrated the generality of PLASMID. In this paper we describe a new software tool for selecting a set of probes for a classification microarray. While the tool was developed for the design of mixed microarrays-and mixed-plasmid microarrays in particular-it can also be used to design expression arrays. The user can choose from several clustering methods (including hierarchical, non-hierarchical, and a model-based genetic algorithm), several probe ranking methods, and several different display methods. A novel approach is used for probe redundancy reduction, and probe selection is accomplished via stepwise discriminant analysis. Data can be entered in different formats (including Excel and comma-delimited text), and dendrogram, heat map, and scatter plot images can be saved in several different formats (including jpeg and tiff). Weights generated using stepwise discriminant analysis can be stored for analysis of subsequent experimental data. Additionally, PLASMID can be used to construct virtual microarrays with genomes from public databases, which can then be used to identify an optimal set of probes.

  7. Selection and Validation of a Multilocus Variable-Number Tandem-Repeat Analysis Panel for Typing Shigella spp.▿ †

    PubMed Central

    Gorgé, Olivier; Lopez, Stéphanie; Hilaire, Valérie; Lisanti, Olivier; Ramisse, Vincent; Vergnaud, Gilles

    2008-01-01

    The Shigella genus has historically been separated into four species, based on biochemical assays. The classification within each species relies on serotyping. Recently, genome sequencing and DNA assays, in particular the multilocus sequence typing (MLST) approach, greatly improved the current knowledge of the origin and phylogenetic evolution of Shigella spp. The Shigella and Escherichia genera are now considered to belong to a unique genomospecies. Multilocus variable-number tandem-repeat (VNTR) analysis (MLVA) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses of highly homogeneous bacterial pathogens. Here, we assess the capability of MLVA for Shigella typing. Thirty-two potentially polymorphic VNTRs were selected by analyzing in silico five Shigella genomic sequences and subsequently evaluated. Eventually, a panel of 15 VNTRs was selected (i.e., MLVA15 analysis). MLVA15 analysis of 78 strains or genome sequences of Shigella spp. and 11 strains or genome sequences of Escherichia coli distinguished 83 genotypes. Shigella population cluster analysis gave consistent results compared to MLST. MLVA15 analysis showed capabilities for E. coli typing, providing classification among pathogenic and nonpathogenic E. coli strains included in the study. The resulting data can be queried on our genotyping webpage (http://mlva.u-psud.fr). The MLVA15 assay is rapid, highly discriminatory, and reproducible for Shigella and Escherichia strains, suggesting that it could significantly contribute to epidemiological trace-back analysis of Shigella infections and pathogenic Escherichia outbreaks. Typing was performed on strains obtained mostly from collections. Further studies should include strains of much more diverse origins, including all pathogenic E. coli types. PMID:18216214

  8. A multi-gene phylogeny of Chlorophyllum (Agaricaceae, Basidiomycota): new species, new combination and infrageneric classification

    PubMed Central

    Ge, Zai-Wei; Jacobs, Adriaana; Vellinga, Else C.; Sysouphanthong, Phongeun; van der Walt, Retha; Lavorato, Carmine; An, Yi-Feng; Yang, Zhu L.

    2018-01-01

    Abstract Taxonomic and phylogenetic studies of Chlorophyllum were carried out on the basis of morphological differences and molecular phylogenetic analyses. Based on the phylogeny inferred from the internal transcribed spacer (ITS), the partial large subunit nuclear ribosomal DNA (nrLSU), the second largest subunit of RNA polymerase II (rpb2) and translation elongation factor 1-α (tef1) sequences, six well-supported clades and 17 phylogenetic species are recognised. Within this phylogenetic framework and considering the diagnostic morphological characters, two new species, C. africanum and C. palaeotropicum, are described. In addition, a new infrageneric classification of Chlorophyllum is proposed, in which the genus is divided into six sections. One new combination is also made. This study provides a robust basis for a more detailed investigation of diversity and biogeography of Chlorophyllum. PMID:29681738

  9. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    NASA Technical Reports Server (NTRS)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  10. Comparative Analysis of RF Emission Based Fingerprinting Techniques for ZigBee Device Classification

    DTIC Science & Technology

    quantify the differences invarious RF fingerprinting techniques via comparative analysis of MDA/ML classification results. The findings herein demonstrate...correct classification rates followed by COR-DNA and then RF-DNA in most test cases and especially in low Eb/N0 ranges, where ZigBee is designed to operate.

  11. Classification of community types, successional sequences, and landscapes of the Copper River Delta, Alaska.

    Treesearch

    Keith. Boggs

    2000-01-01

    A classification of community types, successional sequences, and landscapes is presented for the piedmont of the Copper River Delta. The classification was based on a sampling of 471 sites. A total of 75 community types, 42 successional sequences, and 6 landscapes are described. The classification of community types reflects the existing vegetation communities on the...

  12. An updated version of NPIDB includes new classifications of DNA–protein complexes and their families

    PubMed Central

    Zanegina, Olga; Kirsanov, Dmitriy; Baulin, Eugene; Karyagina, Anna; Alexeevski, Andrei; Spirin, Sergey

    2016-01-01

    The recent upgrade of nucleic acid–protein interaction database (NPIDB, http://npidb.belozersky.msu.ru/) includes a newly elaborated classification of complexes of protein domains with double-stranded DNA and a classification of families of related complexes. Our classifications are based on contacting structural elements of both DNA: the major groove, the minor groove and the backbone; and protein: helices, beta-strands and unstructured segments. We took into account both hydrogen bonds and hydrophobic interaction. The analyzed material contains 1942 structures of protein domains from 748 PDB entries. We have identified 97 interaction modes of individual protein domain–DNA complexes and 17 DNA–protein interaction classes of protein domain families. We analyzed the sources of diversity of DNA–protein interaction modes in different complexes of one protein domain family. The observed interaction mode is sometimes influenced by artifacts of crystallization or diversity in secondary structure assignment. The interaction classes of domain families are more stable and thus possess more biological sense than a classification of single complexes. Integration of the classification into NPIDB allows the user to browse the database according to the interacting structural elements of DNA and protein molecules. For each family, we present average DNA shape parameters in contact zones with domains of the family. PMID:26656949

  13. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  14. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  15. Next-generation systematics: An innovative approach to resolve the structure of complex prokaryotic taxa

    NASA Astrophysics Data System (ADS)

    Sangal, Vartul; Goodfellow, Michael; Jones, Amanda L.; Schwalbe, Edward C.; Blom, Jochen; Hoskisson, Paul A.; Sutcliffe, Iain C.

    2016-12-01

    Prokaryotic systematics provides the fundamental framework for microbiological research but remains a discipline that relies on a labour- and time-intensive polyphasic taxonomic approach, including DNA-DNA hybridization, variation in 16S rRNA gene sequence and phenotypic characteristics. These techniques suffer from poor resolution in distinguishing between closely related species and often result in misclassification and misidentification of strains. Moreover, guidelines are unclear for the delineation of bacterial genera. Here, we have applied an innovative phylogenetic and taxogenomic approach to a heterogeneous actinobacterial taxon, Rhodococcus, to identify boundaries for intrageneric and supraspecific classification. Seven species-groups were identified within the genus Rhodococcus that are as distantly related to one another as they are to representatives of other mycolic acid containing actinobacteria and can thus be equated with the rank of genus. It was also evident that strains assigned to rhodococcal species-groups are underspeciated with many misclassified using conventional taxonomic criteria. The phylogenetic and taxogenomic methods used in this study provide data of theoretical value for the circumscription of generic and species boundaries and are also of practical significance as they provide a robust basis for the classification and identification of rhodococci of agricultural, industrial and medical/veterinary significance.

  16. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  17. Accurate phylogenetic classification of DNA fragments based onsequence composition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequencemore » characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.« less

  18. Proposal of Xanthomonas translucens pv. pistaciae pv. nov., pathogenic to pistachio (Pistacia vera).

    PubMed

    Giblot-Ducray, Danièle; Marefat, Alireza; Gillings, Michael R; Parkinson, Neil M; Bowman, John P; Ophel-Keller, Kathy; Taylor, Cathy; Facelli, Evelina; Scott, Eileen S

    2009-12-01

    Strains of Xanthomonas translucens have caused dieback in the Australian pistachio industry for the last 15 years. Such pathogenicity to a dicotyledonous woody host contrasts with that of other pathovars of X. translucens, which are characterized by their pathogenicity to monocotyledonous plant families. Further investigations, using DNA-DNA hybridization, gyrB gene sequencing and integron screening, were conducted to confirm the taxonomic status of the X. translucens pathogenic to pistachio. DNA-DNA hybridization provided a clear classification, at the species level, of the pistachio pathogen as a X. translucens. In the gyrB-based phylogeny, strains of the pistachio pathogen clustered among the X. translucens pathovars as two distinct lineages. Integron screening revealed that the cassette arrays of strains of the pistachio pathogen were different from those of other Xanthomonas species, and again distinguished two groups. Together with previously reported pathogenicity data, these results confirm that the pistachio pathogen is a new pathovar of X. translucens and allow hypotheses about its origin. The proposed name is Xanthomonas translucens pv. pistaciae pv. nov.

  19. Imagining Sisyphus happy: DNA barcoding and the unnamed majority.

    PubMed

    Blaxter, Mark

    2016-09-05

    The vast majority of life on the Earth is physically small, and is classifiable as micro- or meiobiota. These organisms are numerically dominant and it is likely that they are also abundantly speciose. By contrast, the vast majority of taxonomic effort has been expended on 'charismatic megabionts': larger organisms where a wealth of morphology has facilitated Linnaean species definition. The hugely successful Linnaean project is unlikely to be extensible to the totality of approximately 10 million species in a reasonable time frame and thus alternative toolkits and methodologies need to be developed. One such toolkit is DNA barcoding, particularly in its metabarcoding or metagenetics mode, where organisms are identified purely by the presence of a diagnostic DNA sequence in samples that are not processed for morphological identification. Building on secure Linnaean foundations, classification of unknown (and unseen) organisms to molecular operational taxonomic units (MOTUs) and deployment of these MOTUs in biodiversity science promises a rewarding resolution to the Sisyphean task of naming all the world's species.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.

  20. Phylogenetic sequence analysis, recombinant expression, and tissue distribution of a channel catfish estrogen receptor beta

    USGS Publications Warehouse

    Xia, Zhenfang; Gale, William L.; Chang, Xiaotian; Langenau, David; Patino, Reynaldo; Maule, Alec G.; Densmore, Llewellyn D.

    2000-01-01

    An estrogen receptor β (ERβ) cDNA fragment was amplified by RT-PCR of total RNAextracted from liver and ovary of immature channel catfish. This cDNA fragment was used to screen an ovarian cDNA library made from an immature female fish. A clone was obtained that contained an open reading frame encoding a 575-amino-acid protein with a deduced molecular weight of 63.9 kDa. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of channel catfish ERβ on the basis of 25 full-length teleost and tetrapod ER sequences. The consensus tree obtained indicated the existence of two major vertebrate ER subtypes, α and β. Within each subtype, and in accordance with established phylogenetic relationships, teleost and tetrapod ER were monophyletic confirming the results of a previous analysis (Z. Xiaet al., 1999, Gen. Comp. Endocrinol. 113, 360–368). Extracts of COS-7 cells transfectedwith channel catfish ERβ cDNA bound estrogen with high affinity (Kd = 0.21 nM) and specificity. The affinity of channel catfish ERβ for estrogen was higher than previously reported for channel catfish ERα. As determined by qualitative RT-PCR, the tissue distributions of ERα and ERβ were similar but not identical. Both ER subtypes were present in ovary and testis. ERα was found in all other tissues examined from juvenile and mature fish of both sexes. ERβ was also found in most tissues except, in most cases, whole blood and head kidney. Interestingly, the pattern of expression of ER subtypes in head kidney always corresponded to the pattern in whole blood. In conclusion, we isolated a channel catfish ERβ with ligand-binding affinity and tissue expression patterns different from ERα. Also, we confirmed the validity of our previously proposed general classification scheme for vertebrate ER into α and β subtypes and within each subtype, into teleost and tetrapod clades.

  1. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  2. Morphological and molecular evidence supporting an arbutoid mycorrhizal relationship in the Costa Rican páramo.

    PubMed

    Osmundson, Todd W; Halling, Roy E; den Bakker, Henk C

    2007-05-01

    This study examines evidence for a particular arbutoid mycorrhizal interaction in páramo, a high-altitude neotropical ecosystem important in hydrological regulation but poorly known in terms of its fungal communities. Comarostaphylis arbutoides Lindley (Ericaceae) often forms dense thickets in Central American páramo habitats. Based on phylogenetic classification, it has been suggested that C. arbutoides forms arbutoid mycorrhizae with diverse Basidiomycetes and Ascomycetes; however, this assumption has not previously been confirmed. Based on field data, we hypothesized an arbutoid mycorrhizal association between C. arbutoides and the recently described bolete Leccinum monticola Halling & G.M. Mueller; in this study, we applied a rigorous approach using anatomical and molecular data to examine evidence for such an association. We examined root samples collected beneath L. monticola basidiomes for mycorrhizal structures, and we also compared rDNA internal transcribed spacer (ITS) sequences between mycorrhizal root tips and leaf or basidiome material of the suspected symbionts. Root cross sections showed a thin hyphal sheath and intracellular hyphal coils typical of arbutoid mycorrhizae. DNA sequence comparisons confirmed the identity of C. arbutoides and L. monticola as the mycorrhizal symbionts. In addition, this paper provides additional evidence for the widespread presence of minisatellite-like inserts in the ITS1 spacer in Leccinum species (including a characterization of the insert in L. monticola) and reports the use of an angiosperm-specific ITS primer pair useful for amplifying plant DNA from mycorrhizal roots without co-amplifying fungal DNA.

  3. Comparative analysis of the genomes of intestinal spirochetes of human and animal origin.

    PubMed Central

    Coene, M; Agliano, A M; Paques, A T; Cattani, P; Dettori, G; Sanna, A; Cocito, C

    1989-01-01

    The aim of the present work was to compare the genomes of 21 strains of intestinal spirochetes, which were isolated from patients suffering intestinal disorders, with those of Treponema hyodysenteriae (strain P18), the known etiological agent of swine dysentery (bloody scours), and of a nonpathogenic strain (M1) of Treponema innocens. The percent guanine-plus-cytosine value of the 23 DNAs was found to be 25.5 to 30.1, as determined by a double-labeling procedure based on nick-translation by DNA polymerase I. The genome size of two spirochetal strains, of human and porcine origin, was found to be similar (4 x 10(6) base pairs) and close to that of the reference bacterium Escherichia coli (4.2 x 10(6) base pairs). Restriction analysis showed the presence of two modified bases in spirochetal DNA. Methyladenine was present in the GATC sequence of DNA from 15 spirochetes of human origin, and methylcytosine was present in several sequences occurring in all strains. The DNA of T. hyodysenteriae displayed a 30 to 100% homology with respect to that of 21 spirochetes from humans, thus suggesting the occurrence of a genetic heterogeneity in the latter group. These data indicate that the intestinal spirochetes analyzed in the present work are related; hence there is a possibility of domestic animals being reservoirs of microorganisms pathogenic for humans. A classification of intestinal treponemes into subgroups has been proposed on the basis of restriction analysis and hybridization experiments. Images PMID:2535832

  4. Synthesis of DNA

    DOEpatents

    Mariella, Jr., Raymond P.

    2008-11-18

    A method of synthesizing a desired double-stranded DNA of a predetermined length and of a predetermined sequence. Preselected sequence segments that will complete the desired double-stranded DNA are determined. Preselected segment sequences of DNA that will be used to complete the desired double-stranded DNA are provided. The preselected segment sequences of DNA are assembled to produce the desired double-stranded DNA.

  5. Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

    PubMed

    Gupta, P D

    2016-10-01

    In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.

  6. The genome-wide DNA sequence specificity of the anti-tumour drug bleomycin in human cells.

    PubMed

    Murray, Vincent; Chen, Jon K; Tanaka, Mark M

    2016-07-01

    The cancer chemotherapeutic agent, bleomycin, cleaves DNA at specific sites. For the first time, the genome-wide DNA sequence specificity of bleomycin breakage was determined in human cells. Utilising Illumina next-generation DNA sequencing techniques, over 200 million bleomycin cleavage sites were examined to elucidate the bleomycin genome-wide DNA selectivity. The genome-wide bleomycin cleavage data were analysed by four different methods to determine the cellular DNA sequence specificity of bleomycin strand breakage. For the most highly cleaved DNA sequences, the preferred site of bleomycin breakage was at 5'-GT* dinucleotide sequences (where the asterisk indicates the bleomycin cleavage site), with lesser cleavage at 5'-GC* dinucleotides. This investigation also determined longer bleomycin cleavage sequences, with preferred cleavage at 5'-GT*A and 5'- TGT* trinucleotide sequences, and 5'-TGT*A tetranucleotides. For cellular DNA, the hexanucleotide DNA sequence 5'-RTGT*AY (where R is a purine and Y is a pyrimidine) was the most highly cleaved DNA sequence. It was striking that alternating purine-pyrimidine sequences were highly cleaved by bleomycin. The highest intensity cleavage sites in cellular and purified DNA were very similar although there were some minor differences. Statistical nucleotide frequency analysis indicated a G nucleotide was present at the -3 position (relative to the cleavage site) in cellular DNA but was absent in purified DNA.

  7. DNA methylation-based reclassification of olfactory neuroblastoma.

    PubMed

    Capper, David; Engel, Nils W; Stichel, Damian; Lechner, Matt; Glöss, Stefanie; Schmid, Simone; Koelsche, Christian; Schrimpf, Daniel; Niesen, Judith; Wefers, Annika K; Jones, David T W; Sill, Martin; Weigert, Oliver; Ligon, Keith L; Olar, Adriana; Koch, Arend; Forster, Martin; Moran, Sebastian; Tirado, Oscar M; Sáinz-Japeado, Miguel; Mora, Jaume; Esteller, Manel; Alonso, Javier; Del Muro, Xavier Garcia; Paulus, Werner; Felsberg, Jörg; Reifenberger, Guido; Glatzel, Markus; Frank, Stephan; Monoranu, Camelia M; Lund, Valerie J; von Deimling, Andreas; Pfister, Stefan; Buslei, Rolf; Ribbat-Idel, Julika; Perner, Sven; Gudziol, Volker; Meinhardt, Matthias; Schüller, Ulrich

    2018-05-05

    Olfactory neuroblastoma/esthesioneuroblastoma (ONB) is an uncommon neuroectodermal neoplasm thought to arise from the olfactory epithelium. Little is known about its molecular pathogenesis. For this study, a retrospective cohort of n = 66 tumor samples with the institutional diagnosis of ONB was analyzed by immunohistochemistry, genome-wide DNA methylation profiling, copy number analysis, and in a subset, next-generation panel sequencing of 560 tumor-associated genes. DNA methylation profiles were compared to those of relevant differential diagnoses of ONB. Unsupervised hierarchical clustering analysis of DNA methylation data revealed four subgroups among institutionally diagnosed ONB. The largest group (n = 42, 64%, Core ONB) presented with classical ONB histology and no overlap with other classes upon methylation profiling-based t-distributed stochastic neighbor embedding (t-SNE) analysis. A second DNA methylation group (n = 7, 11%) with CpG island methylator phenotype (CIMP) consisted of cases with strong expression of cytokeratin, no or scarce chromogranin A expression and IDH2 hotspot mutation in all cases. T-SNE analysis clustered these cases together with sinonasal carcinoma with IDH2 mutation. Four cases (6%) formed a small group characterized by an overall high level of DNA methylation, but without CIMP. The fourth group consisted of 13 cases that had heterogeneous DNA methylation profiles and strong cytokeratin expression in most cases. In t-SNE analysis, these cases mostly grouped among sinonasal adenocarcinoma, squamous cell carcinoma, and undifferentiated carcinoma. Copy number analysis indicated highly recurrent chromosomal changes among Core ONB with a high frequency of combined loss of chromosome 1-4, 8-10, and 12. NGS sequencing did not reveal highly recurrent mutations in ONB, with the only recurrently mutated genes being TP53 and DNMT3A. In conclusion, we demonstrate that institutionally diagnosed ONB are a heterogeneous group of tumors. Expression of cytokeratin, chromogranin A, the mutational status of IDH2 as well as DNA methylation patterns may greatly aid in the precise classification of ONB.

  8. Mycofier: a new machine learning-based classifier for fungal ITS sequences.

    PubMed

    Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel

    2016-08-11

    The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .

  9. Cloud-scale genomic signals processing classification analysis for gene expression microarray data.

    PubMed

    Harvey, Benjamin; Soo-Yeon Ji

    2014-01-01

    As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring inference though analysis of DNA/mRNA sequence data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological inference by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale classification analysis of microarray data using Wavelet thresholding in a Cloud environment to identify significantly expressed features. This paper proposes a novel methodology that uses Wavelet based Denoising to initialize a threshold for determination of significantly expressed genes for classification. Additionally, this research was implemented and encompassed within cloud-based distributed processing environment. The utilization of Cloud computing and Wavelet thresholding was used for the classification 14 tumor classes from the Global Cancer Map (GCM). The results proved to be more accurate than using a predefined p-value for differential expression classification. This novel methodology analyzed Wavelet based threshold features of gene expression in a Cloud environment, furthermore classifying the expression of samples by analyzing gene patterns, which inform us of biological processes. Moreover, enabling researchers to face the present and forthcoming challenges that may arise in the analysis of data in functional genomics of large microarray datasets.

  10. Advances in the molecular genetics of gliomas - implications for classification and therapy.

    PubMed

    Reifenberger, Guido; Wirsching, Hans-Georg; Knobbe-Thomsen, Christiane B; Weller, Michael

    2017-07-01

    Genome-wide molecular-profiling studies have revealed the characteristic genetic alterations and epigenetic profiles associated with different types of gliomas. These molecular characteristics can be used to refine glioma classification, to improve prediction of patient outcomes, and to guide individualized treatment. Thus, the WHO Classification of Tumours of the Central Nervous System was revised in 2016 to incorporate molecular biomarkers - together with classic histological features - in an integrated diagnosis, in order to define distinct glioma entities as precisely as possible. This paradigm shift is markedly changing how glioma is diagnosed, and has important implications for future clinical trials and patient management in daily practice. Herein, we highlight the developments in our understanding of the molecular genetics of gliomas, and review the current landscape of clinically relevant molecular biomarkers for use in classification of the disease subtypes. Novel approaches to the genetic characterization of gliomas based on large-scale DNA-methylation profiling and next-generation sequencing are also discussed. In addition, we illustrate how advances in the molecular genetics of gliomas can promote the development and clinical translation of novel pathogenesis-based therapeutic approaches, thereby paving the way towards precision medicine in neuro-oncology.

  11. Elongation Factor-Tu (EF-Tu) proteins structural stability and bioinformatics in ancestral gene reconstruction

    NASA Astrophysics Data System (ADS)

    Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Schneider, P.; Lieberman, D.; Holden, T.; Cheung, T.

    2013-09-01

    A paleo-experimental evolution report on elongation factor EF-Tu structural stability results has provided an opportunity to rewind the tape of life using the ancestral protein sequence reconstruction modeling approach; consistent with the book of life dogma in current biology and being an important component in the astrobiology community. Fractal dimension via the Higuchi fractal method and Shannon entropy of the DNA sequence classification could be used in a diagram that serves as a simple summary. Results from biomedical gene research provide examples on the diagram methodology. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, DLG1 in cognitive skill, and HLA-C in mosquito bite immunology with EF Tu DNA sequences have accounted for the reported circular dichroism thermo-stability data systematically; the results also infer a relatively less volatility geologic time period from 2 to 3 Gyr from adaptation viewpoint. Comparison to Thermotoga maritima MSB8 and Psychrobacter shows that Thermus thermophilus HB8 EF-Tu calibration sequence could be an outlier, consistent with free energy calculation by NUPACK. Diagram methodology allows computer simulation studies and HAR1 shows about 0.5% probability from chimp to human in terms of diagram location, and SNP simulation results such as amoebic meningoencephalitis NAF1 suggest correlation. Extensions to the studies of the translation and transcription elongation factor sequences in Megavirus Chiliensis, Megavirus Lba and Pandoravirus show that the studied Pandoravirus sequence could be an outlier with the highest fractal dimension and lowest entropy, as compared to chicken as a deviant in the DNMT3A DNA methylation gene sequences from zebrafish to human and to the less than one percent probability in computer simulation using the HAR1 0.5% probability as reference. The diagram methodology would be useful in ancestral gene reconstruction studies in astrobiology and also be applicable to the study of point mutation in conformational thermostabilization research with Synchrotron based X-ray data for drug applications such as Parkinson's disease.

  12. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  13. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  14. Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes

    PubMed Central

    Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi

    2004-01-01

    Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely related to ray-finned fishes than to lungfishes. PMID:15070407

  15. Sequence and Structure Dependent DNA-DNA Interactions

    NASA Astrophysics Data System (ADS)

    Kopchick, Benjamin; Qiu, Xiangyun

    Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.

  16. DNA Barcoding the Medusozoa using mtCOI

    NASA Astrophysics Data System (ADS)

    Ortman, Brian D.; Bucklin, Ann; Pagès, Francesc; Youngbluth, Marsh

    2010-12-01

    The Medusozoa are a clade within the Cnidaria comprising the classes Hydrozoa, Scyphozoa, and Cubozoa. Identification of medusozoan species is challenging, even for taxonomic experts, due to their fragile forms and complex, morphologically-distinct life history stages. In this study 231 sequences for a portion of the mitochondrial Cytochrome Oxidase I (mtCOI) gene were obtained from 95 species of Medusozoans including; 84 hydrozoans (61 siphonophores, eight anthomedusae, four leptomedusae, seven trachymedusae, and four narcomedusae), 10 scyphozoans (three coronatae, four semaeostomae, two rhizostomae, and one stauromedusae), and one cubozoan. This region of mtCOI has been used as a DNA barcode (i.e., a molecular character for species recognition and discrimination) for a diverse array of taxa, including some Cnidaria. Kimura 2-parameter (K2P) genetic distances between sequence variants within species ranged from 0 to 0.057 (mean 0.013). Within the 13 genera for which multiple species were available, K2P distance between congeneric species ranged from 0.056 to 0.381. A cluster diagram generated by Neighbor Joining (NJ) using K2P distances reliably clustered all barcodes of the same species with ≥99% bootstrap support, ensuring accurate identification of species. Intra- and inter-specific variation of the mtCOI gene for the Medusozoa are appropriate for this gene to be used as a DNA barcode for species-level identification, but not for phylogenetic analysis or taxonomic classification of unknown sequences at higher taxonomic levels. This study provides a set of molecular tools that can be used to address questions of speciation, biodiversity, life-history, and population boundaries in the Medusozoa.

  17. The Evolution of Tyrosine-Recombinase Elements in Nematoda

    PubMed Central

    Szitenberg, Amir; Koutsovoulos, Georgios; Blaxter, Mark L.; Lunt, David H.

    2014-01-01

    Transposable elements can be categorised into DNA and RNA elements based on their mechanism of transposition. Tyrosine recombinase elements (YREs) are relatively rare and poorly understood, despite sharing characteristics with both DNA and RNA elements. Previously, the Nematoda have been reported to have a substantially different diversity of YREs compared to other animal phyla: the Dirs1-like YRE retrotransposon was encountered in most animal phyla but not in Nematoda, and a unique Pat1-like YRE retrotransposon has only been recorded from Nematoda. We explored the diversity of YREs in Nematoda by sampling broadly across the phylum and including 34 genomes representing the three classes within Nematoda. We developed a method to isolate and classify YREs based on both feature organization and phylogenetic relationships in an open and reproducible workflow. We also ensured that our phylogenetic approach to YRE classification identified truncated and degenerate elements, informatively increasing the number of elements sampled. We identified Dirs1-like elements (thought to be absent from Nematoda) in the nematode classes Enoplia and Dorylaimia indicating that nematode model species do not adequately represent the diversity of transposable elements in the phylum. Nematode Pat1-like elements were found to be a derived form of another Pat1-like element that is present more widely in animals. Several sequence features used widely for the classification of YREs were found to be homoplasious, highlighting the need for a phylogenetically-based classification scheme. Nematode model species do not represent the diversity of transposable elements in the phylum. PMID:25197791

  18. The evolution of tyrosine-recombinase elements in Nematoda.

    PubMed

    Szitenberg, Amir; Koutsovoulos, Georgios; Blaxter, Mark L; Lunt, David H

    2014-01-01

    Transposable elements can be categorised into DNA and RNA elements based on their mechanism of transposition. Tyrosine recombinase elements (YREs) are relatively rare and poorly understood, despite sharing characteristics with both DNA and RNA elements. Previously, the Nematoda have been reported to have a substantially different diversity of YREs compared to other animal phyla: the Dirs1-like YRE retrotransposon was encountered in most animal phyla but not in Nematoda, and a unique Pat1-like YRE retrotransposon has only been recorded from Nematoda. We explored the diversity of YREs in Nematoda by sampling broadly across the phylum and including 34 genomes representing the three classes within Nematoda. We developed a method to isolate and classify YREs based on both feature organization and phylogenetic relationships in an open and reproducible workflow. We also ensured that our phylogenetic approach to YRE classification identified truncated and degenerate elements, informatively increasing the number of elements sampled. We identified Dirs1-like elements (thought to be absent from Nematoda) in the nematode classes Enoplia and Dorylaimia indicating that nematode model species do not adequately represent the diversity of transposable elements in the phylum. Nematode Pat1-like elements were found to be a derived form of another Pat1-like element that is present more widely in animals. Several sequence features used widely for the classification of YREs were found to be homoplasious, highlighting the need for a phylogenetically-based classification scheme. Nematode model species do not represent the diversity of transposable elements in the phylum.

  19. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.

    PubMed

    Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi

    2017-09-22

    DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  20. Genetic variation and relationships of four species of medically important echinostomes (Trematoda: Echinostomatidae) in South-East Asia.

    PubMed

    Saijuntha, Weerachai; Sithithaworn, Paiboon; Duenngai, Kunyarat; Kiatsopit, Nadda; Andrews, Ross H; Petney, Trevor N

    2011-03-01

    Multilocus enzyme electrophoresis (MEE) and DNA sequencing of the mitochondrial cytochrome c oxidase subunit 1 (CO1) gene were used to genetically compare four species of echinostomes of human health importance. Fixed genetic differences among adults of Echinostoma revolutum, Echinostoma malayanum, Echinoparyphium recurvatum and Hypoderaeum conoideum were detected at 51-75% of the enzyme loci examined, while interspecific differences in CO1 sequence were detected at 16-32 (8-16%) of the 205 alignment positions. The results of the MEE analyses also revealed fixed genetic differences between E. revolutum from Thailand and Lao PDR at five (19%) of 27 loci, which could either represent genetic variation between geographically separated populations of a single species, or the existence of a cryptic (i.e. genetically distinct but morphologically similar) species. However, there was no support for the existence of cryptic species within E. revolutum based on the CO1 sequence between the two geographical areas sampled. Genetic variation in CO1 sequence was also detected among E. malayanum from three different species of snail intermediate host. Separate phylogenetic analyses of the MEE and DNA sequence data revealed that the two species of Echinostoma (E. revolutum and E. malayanum) did not form a monophyletic clade. These results, together with the large number of morphologically similar species with inadequate descriptions, poor specific diagnoses and extensive synonymy, suggest that the morphological characters used for species taxonomy of echinostomes in South-East Asia should be reconsidered according to the concordance of biology, morphology and molecular classification. Copyright © 2010 Elsevier B.V. All rights reserved.

  1. MULTILOCUS SEQUENCE TYPING OF BRUCELLA ISOLATES FROM THAILAND.

    PubMed

    Chawjiraphan, Wireeya; Sonthayanon, Piengchan; Chanket, Phanita; Benjathummarak, Surachet; Kerdsin, Anusak; Kalambhaheti, Thareerat

    2016-11-01

    Although brucellosis outbreaks in Thailand are rare, they cause abortions and infertility in animals, resulting in significant economic loss. Because Brucella spp display > 90% DNA homology, multilocus sequence typing (MLST) was employed to categorize local Brucella isolates into sequence types (STs) and to determine their genetic relatedness. Brucella samples were isolated from vaginal secretion of cows and goats, and from blood cultures of infected individuals. Brucella species were determined by multiplex PCR of eight loci, in addition to MLST based on partial DNA sequences of nine house-keeping genes. MLST analysis of 36 isolates revealed 78 distinct novel allele types and 34 novel STs, while two isolates possessed the known ST8. Sequence alignments identified polymorphic sites in each allele, ranging from 2-6%, while overall genetic diversity was 3.6%. MLST analysis of the 36 Brucella isolates classified them into three species, namely, B. melitensis, B. abortus and B. suis, in agreement with multiplex PCR results. Genetic relatedness among ST members of B. melitensis and B. abortus determined by eBURST program revealed ST2 as founder of B. abortus isolates and ST8 the founder of B. melitensis isolates. ST 36, 41 and 50 of Thai Brucella isolates were identified as single locus variants of clonal cluster (CC) 8, while the majority of STs were diverse. The genetic diversity and relatedness identified using MLST revealed hitherto unexpected diversity among Thai Brucella isolates. Genetic classification of isolates could reveal the route of brucellosis transmission among humans and farm animals and also reveal their relationship with other isolates in the region and other parts of the world.

  2. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences

    PubMed Central

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.

    2017-01-01

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204

  3. Molecular and Cytogenetic Characterization of Wild Musa Species

    PubMed Central

    Čížková, Jana; Hřibová, Eva; Christelová, Pavla; Van den Houwe, Ines; Häkkinen, Markku; Roux, Nicolas; Swennen, Rony; Doležel, Jaroslav

    2015-01-01

    The production of bananas is threatened by rapid spreading of various diseases and adverse environmental conditions. The preservation and characterization of banana diversity is essential for the purposes of crop improvement. The world's largest banana germplasm collection maintained at the Bioversity International Transit Centre (ITC) in Belgium is continuously expanded by new accessions of edible cultivars and wild species. Detailed morphological and molecular characterization of the accessions is necessary for efficient management of the collection and utilization of banana diversity. In this work, nuclear DNA content and genomic distribution of 45S and 5S rDNA were examined in 21 diploid accessions recently added to ITC collection, representing both sections of the genus Musa. 2C DNA content in the section Musa ranged from 1.217 to 1.315 pg. Species belonging to section Callimusa had 2C DNA contents ranging from 1.390 to 1.772 pg. While the number of 45S rDNA loci was conserved in the section Musa, it was highly variable in Callimusa species. 5S rRNA gene clusters were found on two to eight chromosomes per diploid cell. The accessions were genotyped using a set of 19 microsatellite markers to establish their relationships with the remaining accessions held at ITC. Genetic diversity done by SSR genotyping platform was extended by phylogenetic analysis of ITS region. ITS sequence data supported the clustering obtained by SSR analysis for most of the accessions. High level of nucleotide diversity and presence of more than two types of ITS sequences in eight wild diploids pointed to their origin by hybridization of different genotypes. This study significantly expands the number of wild Musa species where nuclear genome size and genomic distribution of rDNA loci is known. SSR genotyping identified Musa species that are closely related to the previously characterized accessions and provided data to aid in their classification. Sequence analysis of ITS region provided further information about evolutionary relationships between individual accessions and suggested that some of analyzed accessions were interspecific hybrids and/or backcross progeny. PMID:26252482

  4. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Redescription and phylogenetic position of Myxobolus aeglefini and Myxobolus platessae n. comb. (Myxosporea), parasites in the cartilage of some North Atlantic marine fishes, with notes on the phylogeny and classification of the Platysporina.

    PubMed

    Karlsbakk, Egil; Kristmundsson, Árni; Albano, Marco; Brown, Paul; Freeman, Mark A

    2017-02-01

    Myxobolus 'aeglefini' Auerbach, 1906 was originally described from cranial cartilage of North sea haddock (Melanogrammus aeglefinus), but has subsequently been recorded from cartilaginous tissues of a range of other gadoid hosts, from pleuronectids and from lumpsucker (Cyclopterus lumpus) in the North Atlantic and from a zoarcid fish in the Japan Sea (Pacific). We obtained partial small-subunit rDNA sequences of Myxobolus 'aeglefini' from gadoids and pleuronectids from Norway and Iceland. The sequences from gadoids and pleuronectids represented two different genotypes, showing 98.2% identity. Morphometric studies on the spores from selected gadids and pleuronectids revealed slight but statistically significant differences in spore dimensions associated with the genotypes, the spores from pleuronectids were thicker and with larger polar capsules. We identify the morpho- and genotype from gadoids with Myxobolus 'aeglefini' sensu Auerbach, and the one from pleuronectids with Sphaerospora platessae Woodcock, 1904 as Myxobolus platessae n. comb. The latter species was originally described from Irish Sea plaice (Pleuronectes platessa). Myxobolus albi Picon et al., 2009 described from the common goby Pomatoschistus microps in Scotland is a synonym of M. 'aeglefini'. The Pacific Myxobolus 'aeglefini' represents a separate species, showing only 97.4-97.6% identity to the Atlantic species. In phylogenetic analyses based on SSU rDNA sequences, these and some related marine chondrotropic Myxobolus spp. form a distinct well supported group. This clusters with freshwater and marine myxobolids and Triangula and Cardimyxobolus species, in a basal clade in the phylogeny of the Platysporina. Members of family Myxobilatidae, Ortholinea spp. (currently Ortholineidae) and sequences of some other urinary system infecting myxosporeans form a well supported clade among members of the suborder Platysporina. Based on phylogenetic analyses, we propose the following changes to the classification of Myxosporea: i) Ortholineidae is dismantled and Ortholinea spp. transferred to Myxobilatidae, and ii) Myxobilatidae is transferred from suborder Variisporina to Platysporina. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  6. [Comparative genomics and evolutionary analysis of CRISPR loci in acetic acid bacteria].

    PubMed

    Xia, Kai; Liang, Xin-le; Li, Yu-dong

    2015-12-01

    The clustered regularly interspaced short palindromic repeat (CRISPR) is a widespread adaptive immunity system that exists in most archaea and many bacteria against foreign DNA, such as phages, viruses and plasmids. In general, CRISPR system consists of direct repeat, leader, spacer and CRISPR-associated sequences. Acetic acid bacteria (AAB) play an important role in industrial fermentation of vinegar and bioelectrochemistry. To investigate the polymorphism and evolution pattern of CRISPR loci in acetic acid bacteria, bioinformatic analyses were performed on 48 species from three main genera (Acetobacter, Gluconacetobacter and Gluconobacter) with whole genome sequences available from the NCBI database. The results showed that the CRISPR system existed in 32 species of the 48 strains studied. Most of the CRISPR-Cas system in AAB belonged to type I CRISPR-Cas system (subtype E and C), but type II CRISPR-Cas system which contain cas9 gene was only found in the genus Acetobacter and Gluconacetobacter. The repeat sequences of some CRISPR were highly conserved among species from different genera, and the leader sequences of some CRISPR possessed conservative motif, which was associated with regulated promoters. Moreover, phylogenetic analysis of cas1 demonstrated that they were suitable for classification of species. The conservation of cas1 genes was associated with that of repeat sequences among different strains, suggesting they were subjected to similar functional constraints. Moreover, the number of spacer was positively correlated with the number of prophages and insertion sequences, indicating the acetic acid bacteria were continually invaded by new foreign DNA. The comparative analysis of CRISR loci in acetic acid bacteria provided the basis for investigating the molecular mechanism of different acetic acid tolerance and genome stability in acetic acid bacteria.

  7. Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.

    PubMed

    Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene

    2017-02-01

    Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  8. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  9. Single-cell genomic sequencing using Multiple Displacement Amplification.

    PubMed

    Lasken, Roger S

    2007-10-01

    Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).

  10. A cricket Gene Index: a genomic resource for studying neurobiology, speciation, and molecular evolution

    PubMed Central

    Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry L

    2007-01-01

    Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution. PMID:17459168

  11. Functional classification of rice flanking sequence tagged genes using MapMan terms and global understanding on metabolic and regulatory pathways affected by dxr mutant having defects in light response.

    PubMed

    Chandran, Anil Kumar Nalini; Lee, Gang-Seob; Yoo, Yo-Han; Yoon, Ung-Han; Ahn, Byung-Ohg; Yun, Doh-Won; Kim, Jin-Hyun; Choi, Hong-Kyu; An, GynHeung; Kim, Tae-Ho; Jung, Ki-Hong

    2016-12-01

    Rice is one of the most important food crops for humans. To improve the agronomical traits of rice, the functions of more than 1,000 rice genes have been recently characterized and summarized. The completed, map-based sequence of the rice genome has significantly accelerated the functional characterization of rice genes, but progress remains limited in assigning functions to all predicted non-transposable element (non-TE) genes, estimated to number 37,000-41,000. The International Rice Functional Genomics Consortium (IRFGC) has generated a huge number of gene-indexed mutants by using mutagens such as T-DNA, Tos17 and Ds/dSpm. These mutants have been identified by 246,566 flanking sequence tags (FSTs) and cover 65 % (25,275 of 38,869) of the non-TE genes in rice, while the mutation ratio of TE genes is 25.7 %. In addition, almost 80 % of highly expressed non-TE genes have insertion mutations, indicating that highly expressed genes in rice chromosomes are more likely to have mutations by mutagens such as T-DNA, Ds, dSpm and Tos17. The functions of around 2.5 % of rice genes have been characterized, and studies have mainly focused on transcriptional and post-transcriptional regulation. Slow progress in characterizing the function of rice genes is mainly due to a lack of clues to guide functional studies or functional redundancy. These limitations can be partially solved by a well-categorized functional classification of FST genes. To create this classification, we used the diverse overviews installed in the MapMan toolkit. Gene Ontology (GO) assignment to FST genes supplemented the limitation of MapMan overviews. The functions of 863 of 1,022 known genes can be evaluated by current FST lines, indicating that FST genes are useful resources for functional genomic studies. We assigned 16,169 out of 29,624 FST genes to 34 MapMan classes, including major three categories such as DNA, RNA and protein. To demonstrate the MapMan application on FST genes, transcriptome analysis was done from a rice mutant of 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR) gene with FST. Mapping of 756 down-regulated genes in dxr mutants and their annotation in terms of various MapMan overviews revealed candidate genes downstream of DXR-mediating light signaling pathway in diverse functional classes such as the methyl-D-erythritol 4-phosphatepathway (MEP) pathway overview, photosynthesis, secondary metabolism and regulatory overview. This report provides a useful guide for systematic phenomics and further applications to enhance the key agronomic traits of rice.

  12. Genomic landscape of gastric cancer: molecular classification and potential targets.

    PubMed

    Guo, Jiawei; Yu, Weiwei; Su, Hui; Pang, Xiufeng

    2017-02-01

    Gastric cancer imposes a considerable health burden worldwide, and its mortality ranks as the second highest for all types of cancers. The limited knowledge of the molecular mechanisms underlying gastric cancer tumorigenesis hinders the development of therapeutic strategies. However, ongoing collaborative sequencing efforts facilitate molecular classification and unveil the genomic landscape of gastric cancer. Several new drivers and tumorigenic pathways in gastric cancer, including chromatin remodeling genes, RhoA-related pathways, TP53 dysregulation, activation of receptor tyrosine kinases, stem cell pathways and abnormal DNA methylation, have been revealed. These newly identified genomic alterations await translation into clinical diagnosis and targeted therapies. Considering that loss-of-function mutations are intractable, synthetic lethality could be employed when discussing feasible therapeutic strategies. Although many challenges remain to be tackled, we are optimistic regarding improvements in the prognosis and treatment of gastric cancer in the near future.

  13. Acquisition of New DNA Sequences After Infection of Chicken Cells with Avian Myeloblastosis Virus

    PubMed Central

    Shoyab, M.; Baluda, M. A.; Evans, R.

    1974-01-01

    DNA-RNA hybridization studies between 70S RNA from avian myeloblastosis virus (AMV) and an excess of DNA from (i) AMV-induced leukemic chicken myeloblasts or (ii) a mixture of normal and of congenitally infected K-137 chicken embryos producing avian leukosis viruses revealed the presence of fast- and slow-hybridizing virus-specific DNA sequences. However, the leukemic cells contained twice the level of AMV-specific DNA sequences observed in normal chicken embryonic cells. The fast-reacting sequences were two to three times more numerous in leukemic DNA than in DNA from the mixed embryos. The slow-reacting sequences had a reiteration frequency of approximately 9 and 6, in the two respective systems. Both the fast- and the slow-reacting DNA sequences in leukemic cells exhibited a higher Tm (2 C) than the respective DNA sequences in normal cells. In normal and leukemic cells the slow hybrid sequences appeared to have a Tm which was 2 C higher than that of the fast hybrid sequences. Individual non-virus-producing chicken embryos, either group-specific antigen positive or negative, contained 40 to 100 copies of the fast sequences and 2 to 6 copies of the slowly hybridizing sequences per cell genome. Normal rat cells did not contain DNA that hybridized with AMV RNA, whereas non-virus-producing rat cells transformed by B-77 avian sarcoma virus contained only the slowly reacting sequences. The results demonstrate that leukemic cells transformed by AMV contain new AMV-specific DNA sequences which were not present before infection. PMID:16789139

  14. Polyphasic taxonomy of the genus Shewanella and description of Shewanella oneidensis sp. nov

    NASA Technical Reports Server (NTRS)

    Venkateswaran, K.; Moser, D. P.; Dollhopf, M. E.; Lies, D. P.; Saffarini, D. A.; MacGregor, B. J.; Ringelberg, D. B.; White, D. C.; Nishijima, M.; Sano, H.; hide

    1999-01-01

    The genus Shewanella has been studied since 1931 with regard to a variety of topics of relevance to both applied and environmental microbiology. Recent years have seen the introduction of a large number of new Shewanella-like isolates, necessitating a coordinated review of the genus. In this work, the phylogenetic relationships among known shewanellae were examined using a battery of morphological, physiological, molecular and chemotaxonomic characterizations. This polyphasic taxonomy takes into account all available phenotypic and genotypic data and integrates them into a consensus classification. Based on information generated from this study and obtained from the literature, a scheme for the identification of Shewanella species has been compiled. Key phenotypic characteristics were sulfur reduction and halophilicity. Fatty acid and quinone profiling were used to impart an additional layer of information. Molecular characterizations employing small-subunit 16S rDNA sequences were at the limits of resolution for the differentiation of species in some cases. As a result, DNA-DNA hybridization and sequence analyses of a more rapidly evolving molecule (gyrB gene) were performed. Species-specific PCR probes were designed for the gyrB gene and used for the rapid screening of closely related strains. With this polyphasic approach, in addition to the ten described Shewanella species, two new species, Shewanella oneidensis and 'Shewanella pealeana', were recognized; Shewanella oneidensis sp. nov. is described here for the first time.

  15. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    PubMed Central

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  16. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  17. Classification of Sharks in the Egyptian Mediterranean Waters Using Morphological and DNA Barcoding Approaches

    PubMed Central

    Moftah, Marie; Abdel Aziz, Sayeda H.; Elramah, Sara; Favereaux, Alexandre

    2011-01-01

    The identification of species constitutes the first basic step in phylogenetic studies, biodiversity monitoring and conservation. DNA barcoding, i.e. the sequencing of a short standardized region of DNA, has been proposed as a new tool for animal species identification. The present study provides an update on the composition of shark in the Egyptian Mediterranean waters off Alexandria, since the latest study to date was performed 30 years ago, DNA barcoding was used in addition to classical taxonomical methodologies. Thus, 51 specimen were DNA barcoded for a 667 bp region of the mitochondrial COI gene. Although DNA barcoding aims at developing species identification systems, some phylogenetic signals were apparent in the data. In the neighbor-joining tree, 8 major clusters were apparent, each of them containing individuals belonging to the same species, and most with 100% bootstrap value. This study is the first to our knowledge to use DNA barcoding of the mitochondrial COI gene in order to confirm the presence of species Squalus acanthias, Oxynotus centrina, Squatina squatina, Scyliorhinus canicula, Scyliorhinus stellaris, Mustelus mustelus, Mustelus punctulatus and Carcharhinus altimus in the Egyptian Mediterranean waters. Finally, our study is the starting point of a new barcoding database concerning shark composition in the Egyptian Mediterranean waters (Barcoding of Egyptian Mediterranean Sharks [BEMS], http://www.boldsystems.org/views/projectlist.php?&#Barcoding%20Fish%20%28FishBOL%29). PMID:22087242

  18. Comparative study of classification algorithms for immunosignaturing data

    PubMed Central

    2012-01-01

    Background High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data. Results We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy. Conclusions ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties. PMID:22720696

  19. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    PubMed

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-03-26

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.

  20. Process of labeling specific chromosomes using recombinant repetitive DNA

    DOEpatents

    Moyzis, R.K.; Meyne, J.

    1988-02-12

    Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.

  1. Highly Accurate Classification of Watson-Crick Basepairs on Termini of Single DNA Molecules

    PubMed Central

    Winters-Hilt, Stephen; Vercoutere, Wenonah; DeGuzman, Veronica S.; Deamer, David; Akeson, Mark; Haussler, David

    2003-01-01

    We introduce a computational method for classification of individual DNA molecules measured by an α-hemolysin channel detector. We show classification with better than 99% accuracy for DNA hairpin molecules that differ only in their terminal Watson-Crick basepairs. Signal classification was done in silico to establish performance metrics (i.e., where train and test data were of known type, via single-species data files). It was then performed in solution to assay real mixtures of DNA hairpins. Hidden Markov Models (HMMs) were used with Expectation/Maximization for denoising and for associating a feature vector with the ionic current blockade of the DNA molecule. Support Vector Machines (SVMs) were used as discriminators, and were the focus of off-line training. A multiclass SVM architecture was designed to place less discriminatory load on weaker discriminators, and novel SVM kernels were used to boost discrimination strength. The tuning on HMMs and SVMs enabled biophysical analysis of the captured molecule states and state transitions; structure revealed in the biophysical analysis was used for better feature selection. PMID:12547778

  2. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Enlightenment of Yeast Mitochondrial Homoplasmy: Diversified Roles of Gene Conversion

    PubMed Central

    Ling, Feng; Mikawa, Tsutomu; Shibata, Takehiko

    2011-01-01

    Mitochondria have their own genomic DNA. Unlike the nuclear genome, each cell contains hundreds to thousands of copies of mitochondrial DNA (mtDNA). The copies of mtDNA tend to have heterogeneous sequences, due to the high frequency of mutagenesis, but are quickly homogenized within a cell (“homoplasmy”) during vegetative cell growth or through a few sexual generations. Heteroplasmy is strongly associated with mitochondrial diseases, diabetes and aging. Recent studies revealed that the yeast cell has the machinery to homogenize mtDNA, using a common DNA processing pathway with gene conversion; i.e., both genetic events are initiated by a double-stranded break, which is processed into 3′ single-stranded tails. One of the tails is base-paired with the complementary sequence of the recipient double-stranded DNA to form a D-loop (homologous pairing), in which repair DNA synthesis is initiated to restore the sequence lost by the breakage. Gene conversion generates sequence diversity, depending on the divergence between the donor and recipient sequences, especially when it occurs among a number of copies of a DNA sequence family with some sequence variations, such as in immunoglobulin diversification in chicken. MtDNA can be regarded as a sequence family, in which the members tend to be diversified by a high frequency of spontaneous mutagenesis. Thus, it would be interesting to determine why and how double-stranded breakage and D-loop formation induce sequence homogenization in mitochondria and sequence diversification in nuclear DNA. We will review the mechanisms and roles of mtDNA homoplasmy, in contrast to nuclear gene conversion, which diversifies gene and genome sequences, to provide clues toward understanding how the common DNA processing pathway results in such divergent outcomes. PMID:24710143

  4. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  5. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  6. Phylogenetic relationships among Lactuca (Asteraceae) species and related genera based on ITS-1 DNA sequences.

    PubMed

    Koopman, W J; Guetta, E; van de Wiel, C C; Vosman, B; van den Berg, R G

    1998-11-01

    Internal transcribed spacer (ITS-1) sequences from 97 accessions representing 23 species of Lactuca and related genera were determined and used to evaluate species relationships of Lactuca sensu lato (s.l.). The ITS-1 phylogenies, calculated using PAUP and PHYLIP, correspond better to the classification of Feráková than to other classifications evaluated, although the inclusion of sect. Lactuca subsect. Cyanicae is not supported. Therefore, exclusion of subsect. Cyanicae from Lactuca sensu Feráková is proposed. The amended genus contains the entire gene pool (sensu Harlan and De Wet) of cultivated lettuce (Lactuca sativa). The position of the species in the amended classification corresponds to their position in the lettuce gene pool. In the ITS-1 phylogenies, a clade with L. sativa, L. serriola, L. dregeana, L. altaica, and L. aculeata represents the primary gene pool. L. virosa and L. saligna, branching off closest to this clade, encompass the secondary gene pool. L. virosa is possibly of hybrid origin. The primary and secondary gene pool species are classified in sect. Lactuca subsect. Lactuca. The species L. quercina, L. viminea, L. sibirica, and L. tatarica, branching off next, represent the tertiary gene pool. They are classified in Lactuca sect. Lactucopsis, sect. Phaenixopus, and sect. Mulgedium, respectively. L. perennis and L. tenerrima, classified in sect. Lactuca subsect. Cyanicae, form clades with species from related genera and are not part of the lettuce gene pool.

  7. Automated Gene Ontology annotation for anonymous sequence data.

    PubMed

    Hennig, Steffen; Groth, Detlef; Lehrach, Hans

    2003-07-01

    Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

  8. Influence of DNA sequence on the structure of minicircles under torsional stress

    PubMed Central

    Wang, Qian; Irobalieva, Rossitza N.; Chiu, Wah; Schmid, Michael F.; Fogg, Jonathan M.; Zechiedrich, Lynn

    2017-01-01

    Abstract The sequence dependence of the conformational distribution of DNA under various levels of torsional stress is an important unsolved problem. Combining theory and coarse-grained simulations shows that the DNA sequence and a structural correlation due to topology constraints of a circle are the main factors that dictate the 3D structure of a 336 bp DNA minicircle under torsional stress. We found that DNA minicircle topoisomers can have multiple bend locations under high torsional stress and that the positions of these sharp bends are determined by the sequence, and by a positive mechanical correlation along the sequence. We showed that simulations and theory are able to provide sequence-specific information about individual DNA minicircles observed by cryo-electron tomography (cryo-ET). We provided a sequence-specific cryo-ET tomogram fitting of DNA minicircles, registering the sequence within the geometric features. Our results indicate that the conformational distribution of minicircles under torsional stress can be designed, which has important implications for using minicircle DNA for gene therapy. PMID:28609782

  9. Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.

    DTIC Science & Technology

    1992-05-01

    DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long

  10. Construction and analysis of an SSH cDNA library of early heat-induced genes of Vigna aconitifolia variety RMO-40.

    PubMed

    Rampuria, Sakshi; Joshi, Uma; Palit, Paramita; Deokar, Amit A; Meghwal, Raju R; Mohapatra, T; Srinivasan, R; Bhatt, K V; Sharma, Ramavtar

    2012-11-01

    Moth bean ( Vigna aconitifolia (Jacq.) Marechal) is an important grain legume crop grown in rain fed areas of hot desert regions of Thar, India, under scorching sun rays with very little supplementation of water. An SSH cDNA library was generated from leaf tissues of V. aconitifolia var. RMO-40 exposed to an elevated temperature of 42 °C for 5 min to identify early-induced genes. A total of 488 unigenes (114 contigs and 374 singletons) were derived by cluster assembly and sequence alignment of 738 ESTs; out of 206 ESTs (28%) of unknown proteins, 160 ESTs (14%) were found to be novel to moth bean. Only 578 ESTs (78%) showed significant BLASTX similarity (<1 × 10(-6)) in the NCBI non-redundant database. Gene ontology functional classification terms were retrieved for 479 (65%) sequences, and 339 sequences were annotated with 165 EC codes and mapped to 68 different KEGG pathways. Four hundred and fifty-two ESTs were further annotated with InterProScan (IPS), and no IPS was assigned to 153 ESTs. In addition, the expression level of 27 ESTs in response to heat stress was evaluated through semiquantitative RT-PCR assay. Approximately 20 different signaling genes and 16 different transcription factors have been shown to be associated with heat stress in moth bean for the first time.

  11. Impact of Soil Texture on Soil Ciliate Communities

    NASA Astrophysics Data System (ADS)

    Chau, J. F.; Brown, S.; Habtom, E.; Brinson, F.; Epps, M.; Scott, R.

    2014-12-01

    Soil water content and connectivity strongly influence microbial activities in soil, controlling access to nutrients and electron acceptors, and mediating interactions between microbes within and between trophic levels. These interactions occur at or below the pore scale, and are influenced by soil texture and structure, which determine the microscale architecture of soil pores. Soil protozoa are relatively understudied, especially given the strong control they exert on bacterial communities through predation. Here, ciliate communities in soils of contrasting textures were investigated. Two ciliate-specific primer sets targeting the 18S rRNA gene were used to amplify DNA extracted from eight soil samples collected from Sumter National Forest in western South Carolina. Primer sets 121F-384F-1147R (semi-nested) and 315F-959R were used to amplify soil ciliate DNA via polymerase chain reaction (PCR), and the resulting PCR products were analyzed by gel electrophoresis to obtain quantity and band size. Approximately two hundred ciliate 18S rRNA sequences were obtained were obtained from each of two contrasting soils. Sequences were aligned against the NCBI GenBank database for identification, and the taxonomic classification of best-matched sequences was determined. The ultimate goal of the work is to quantify changes in the ciliate community under short-timescale changes in hydrologic conditions for varying soil textures, elucidating dynamic responses to desiccation stress in major soil ciliate taxa.

  12. What Is Peromyscus? Evidence from nuclear and mitochondrial DNA sequences suggests the need for a new classification

    PubMed Central

    Platt, Roy N.; Amman, Brian R.; Keith, Megan S.; Thompson, Cody W.; Bradley, Robert D.

    2015-01-01

    The evolutionary relationships between Peromyscus, Habromys, Isthmomys, Megadontomys, Neotomodon, Osgoodomys, and Podomys are poorly understood. In order to further explore the evolutionary boundaries of Peromyscus and compare potential taxonomic solutions for this diverse group and its relatives, we conducted phylogenetic analyses of DNA sequence data from alcohol dehydrogenase (Adh1-I2), beta fibrinogen (Fgb-I7), interphotoreceptor retinoid-binding protein (Rbp3), and cytochrome-b (Cytb). Phylogenetic analyses of mitochondrial and nuclear genes produced similar topologies although levels of nodal support varied. The best-supported topology was obtained by combining nuclear and mitochondrial sequences. No monophyletic Peromyscus clade was supported. Instead, support was found for a clade containing Habromys, Megadontomys, Neotomodon, Osgoodomys, Podomys, and Peromyscus suggesting paraphyly of Peromyscus and confirming previous observations. Our analyses indicated an early divergence of Isthmomys from Peromyscus (approximately 8 million years ago), whereas most other peromyscine taxa emerged within the last 6 million years. To recover a monophyletic taxonomy from Peromyscus and affiliated lineages, we detail 3 taxonomic options in which Habromys, Megadontomys, Neotomodon, Osgoodomys, and Podomys are retained as genera, subsumed as subgenera, or subsumed as species groups within Peromyscus. Each option presents distinct taxonomic challenges, and the appropriate taxonomy must reflect the substantial levels of morphological divergence that characterize this group while maintaining the monophyletic relationships obtained from genetic data. PMID:26937047

  13. What Is Peromyscus? Evidence from nuclear and mitochondrial DNA sequences suggests the need for a new classification.

    PubMed

    Platt, Roy N; Amman, Brian R; Keith, Megan S; Thompson, Cody W; Bradley, Robert D

    2015-08-03

    The evolutionary relationships between Peromyscus , Habromys , Isthmomys , Megadontomys , Neotomodon , Osgoodomys , and Podomys are poorly understood. In order to further explore the evolutionary boundaries of Peromyscus and compare potential taxonomic solutions for this diverse group and its relatives, we conducted phylogenetic analyses of DNA sequence data from alcohol dehydrogenase ( Adh 1-I2), beta fibrinogen ( Fgb -I7), interphotoreceptor retinoid-binding protein ( Rbp 3), and cytochrome- b ( Cytb ). Phylogenetic analyses of mitochondrial and nuclear genes produced similar topologies although levels of nodal support varied. The best-supported topology was obtained by combining nuclear and mitochondrial sequences. No monophyletic Peromyscus clade was supported. Instead, support was found for a clade containing Habromys , Megadontomys , Neotomodon , Osgoodomys , Podomys , and Peromyscus suggesting paraphyly of Peromyscus and confirming previous observations. Our analyses indicated an early divergence of Isthmomys from Peromyscus (approximately 8 million years ago), whereas most other peromyscine taxa emerged within the last 6 million years. To recover a monophyletic taxonomy from Peromyscus and affiliated lineages, we detail 3 taxonomic options in which Habromys , Megadontomys , Neotomodon , Osgoodomys , and Podomys are retained as genera, subsumed as subgenera, or subsumed as species groups within Peromyscus . Each option presents distinct taxonomic challenges, and the appropriate taxonomy must reflect the substantial levels of morphological divergence that characterize this group while maintaining the monophyletic relationships obtained from genetic data.

  14. Identification, Classification, and Phylogeny of the Pathogenic Species Exophiala jeanselmei and Related Species by Mitochondrial Cytochrome b Gene Analysis

    PubMed Central

    Wang, Li; Yokoyama, Koji; Miyaji, Makoto; Nishimura, Kazuko

    2001-01-01

    We analyzed a 402-bp sequence of the mitochondrial cytochrome b gene of 34 strains of Exophiala jeanselmei and 16 strains representing 12 related species. The strains of E. jeanselmei were classified into 20 DNA types and 17 amino acid types. The differences between these strains were found in 1 to 60 nucleotides and 1 to 17 amino acids. On the basis of the identities and similarities of nucleotide and amino acid sequences, some strains were reidentified: i.e., two strains of E. jeanselmei var. hetermorpha and one strain of E. castellanii as E. dermatitidis (including the type strain), three strains of E. jeanselmei as E. jeanselmei var. lecanii-corni (including the type strain), three strains of E. jeanselmei as E. bergeri (including the type strain), seven strains of E. jeanselmei as E. pisciphila (including the type strain), seven strains of E. jeanselmei as E. jeanselmei var. jeanselmei (including the type strain), one strain of E. jeanselmei as Fonsecaea pedrosoi (including the type strain), and one strain of E. jeanselmei as E. spinifera (including the type strain). Some E. jeanselmei strains showed distinct nucleotide and amino acid sequences. The amino-acid-based UPGMA (unweighted pair group method with the arithmetic mean) tree exhibited nearly the same topology as those of the DNA-based trees obtained by neighbor joining, maximum parsimony, and maximum likelihood methods. PMID:11724862

  15. A molecular phylogenetic appraisal of the acanthostomines Acanthostomum and Timoniella and their position within Cryptogonimidae (Trematoda: Opisthorchioidea)

    PubMed Central

    Vidal-Martínez, Victor M.

    2017-01-01

    The phylogenetic position of three taxa from two trematode genera, belonging to the subfamily Acanthostominae (Opisthorchioidea: Cryptogonimidae), were analysed using partial 28S ribosomal DNA (Domains 1–2) and internal transcribed spacers (ITS1–5.8S–ITS2). Bayesian inference and Maximum likelihood analyses of combined 28S rDNA and ITS1 + 5.8S + ITS2 sequences indicated the monophyly of the genus Acanthostomum (A. cf. americanum and A. burminis) and paraphyly of the Acanthostominae. These phylogenetic relationships were consistent in analyses of 28S alone and concatenated 28S + ITS1 + 5.8S + ITS2 sequences analyses. Based on molecular phylogenetic analyses, the subfamily Acanthostominae is therefore a paraphyletic taxon, in contrast with previous classifications based on morphological data. Phylogenetic patterns of host specificity inferred from adult stages of other cryptogonimid taxa are also well supported. However, analyses using additional genera and species are necessary to support the phylogenetic inferences from this study. Our molecular phylogenetic reconstruction linked two larval stages of A. cf. americanum cercariae and metacercariae. Here, we present the evolutionary and ecological implications of parasitic infections in freshwater and brackish environments. PMID:29250471

  16. A molecular phylogenetic appraisal of the acanthostomines Acanthostomum and Timoniella and their position within Cryptogonimidae (Trematoda: Opisthorchioidea).

    PubMed

    Martínez-Aquino, Andrés; Vidal-Martínez, Victor M; Aguirre-Macedo, M Leopoldina

    2017-01-01

    The phylogenetic position of three taxa from two trematode genera, belonging to the subfamily Acanthostominae (Opisthorchioidea: Cryptogonimidae), were analysed using partial 28S ribosomal DNA (Domains 1-2) and internal transcribed spacers (ITS1-5.8S-ITS2). Bayesian inference and Maximum likelihood analyses of combined 28S rDNA and ITS1 + 5.8S + ITS2 sequences indicated the monophyly of the genus Acanthostomum ( A. cf. americanum and A. burminis ) and paraphyly of the Acanthostominae . These phylogenetic relationships were consistent in analyses of 28S alone and concatenated 28S + ITS1 + 5.8S + ITS2 sequences analyses. Based on molecular phylogenetic analyses, the subfamily Acanthostominae is therefore a paraphyletic taxon, in contrast with previous classifications based on morphological data. Phylogenetic patterns of host specificity inferred from adult stages of other cryptogonimid taxa are also well supported. However, analyses using additional genera and species are necessary to support the phylogenetic inferences from this study. Our molecular phylogenetic reconstruction linked two larval stages of A. cf. americanum cercariae and metacercariae. Here, we present the evolutionary and ecological implications of parasitic infections in freshwater and brackish environments.

  17. Phylogenetics and diversification of Cotyledon (Crassulaceae) inferred from nuclear and chloroplast DNA sequence data.

    PubMed

    Mort, Mark E; Levsen, Nicholas; Randle, Christopher P; Van Jaarsveld, Ernst; Palmer, Annie

    2005-07-01

    Crassulaceae includes approximately 35 genera and 1500 species of leaf and stem succulent flowering plants. The family is nearly cosmopolitan in distribution, but is particularly diverse in southern Africa, where five genera comprising approximately 325 species are found. One of these genera, Cotyledon, includes 10 species that are largely confined to South Africa, where they are commonly found on rocky hillsides, coastal flats, and cliff faces. Species of Cotyledon are characterized by five-parted, pendulous, sympetalous flowers, but the genus is highly diverse in growth form, flower color and size, and leaf morphology. One particularly variable species, C. orbiculata, has been divided into five varieties based on leaf morphology and biogeography; however, the monophyly of this species as well as the relationships among the varieties have not previously been investigated. Parsimony analyses of a combined data set of DNA sequences from chloroplast and nuclear genome provided the first estimate of phylogeny for Cotyledon, and resulted in two minimum-length trees and a fully resolved phylogeny for the genus. Results indicate that C. orbiculata is not monophyletic and suggest the need for additional studies and a revised classification within the genus.

  18. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.

    1997-05-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  19. Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae.

    PubMed Central

    Colombo, M M; Swanton, M T; Donini, P; Prescott, D M

    1984-01-01

    Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934

  20. DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation

    PubMed Central

    Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob

    2014-01-01

    As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252

  1. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  2. Divergent nuclear 18S rDNA paralogs in a turkey coccidium, Eimeria meleagrimitis, complicate molecular systematics and identification.

    PubMed

    El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R

    2013-07-01

    Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.

  3. 32 CFR 2001.12 - Duration of classification.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... classification authority shall follow the sequence listed in paragraphs (a)(1)(i), (ii), and (iii) of this... to the sequence in paragraph (a)(1) of this section are as follows: (i) If an original classification... and shall be designated with the following marking, “50X1-HUM;” or (ii) If an original classification...

  4. 32 CFR 2001.12 - Duration of classification.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... classification authority shall follow the sequence listed in paragraphs (a)(1)(i), (ii), and (iii) of this... to the sequence in paragraph (a)(1) of this section are as follows: (i) If an original classification... and shall be designated with the following marking, “50X1-HUM;” or (ii) If an original classification...

  5. Affordable hands-on DNA sequencing and genotyping: an exercise for teaching DNA analysis to undergraduates.

    PubMed

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C Sanger sequencing reactions. They prepare and run the gels, perform Southern blots (which require only 10 min), and detect sequencing ladders using a colorimetric detection system. Students enlarge their sequencing ladders from digital images of their small nylon membranes, and read the sequence manually. They compare their reads with the actual DNA sequence using BLAST2. After mastering the DNA sequencing system, students prepare their own DNA from a cheek swab, polymerase chain reaction-amplify a region of their DNA that encompasses a SNP of interest, and perform sequencing to determine their genotype at the SNP position. A family pedigree can also be constructed. The SNP chosen by the instructor was rs17822931, which is in the ABCC11 gene and is the determinant of human earwax type. Genotypes at the rs178229931 site vary in different ethnic populations. © 2013 by The International Union of Biochemistry and Molecular Biology.

  6. Preliminary Classification of Novel Hemorrhagic Fever-Causing Viruses Using Sequence-Based PAirwise Sequence Comparison (PASC) Analysis.

    PubMed

    Bào, Yīmíng; Kuhn, Jens H

    2018-01-01

    During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.

  7. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  8. Comparing K-mer based methods for improved classification of 16S sequences.

    PubMed

    Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars

    2015-07-01

    The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.

  9. Non-linear molecular pattern classification using molecular beacons with multiple targets.

    PubMed

    Lee, In-Hee; Lee, Seung Hwan; Park, Tai Hyun; Zhang, Byoung-Tak

    2013-12-01

    In vitro pattern classification has been highlighted as an important future application of DNA computing. Previous work has demonstrated the feasibility of linear classifiers using DNA-based molecular computing. However, complex tasks require non-linear classification capability. Here we design a molecular beacon that can interact with multiple targets and experimentally shows that its fluorescent signals form a complex radial-basis function, enabling it to be used as a building block for non-linear molecular classification in vitro. The proposed method was successfully applied to solving artificial and real-world classification problems: XOR and microRNA expression patterns. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  10. DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification

    PubMed Central

    2013-01-01

    Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217

  11. Phylogeny and classification of Naucleeae s.l. (Rubiaceae) inferred from molecular (ITS, rBCL, and tRNT-F) and morphological data.

    PubMed

    Razafimandimbison, Sylvain G; Bremer, Birgitta

    2002-07-01

    Parsimony analyses of the tribe Naucleeae sensu lato (s.l.) using the noncoding internal transcribed spacer (ITS) regions of nuclear rDNA, the protein-coding rbcL and noncoding trnT-F regions of chloroplast DNA, and morphological data were performed to construct new intratribal classification, test the monophyly of previous subtribal circumscriptions, and evaluate the generic status of Naucleeae s.l. Fifty-two ITS, 45 rbcL, and 55 trnT-F new sequences are published here. Our study supports the monophyly of the subtribes Anthocephalidae, Mitragynae, Uncariae all sensu Haviland and Naucleinae sensu Ridsdale. There was no support for Cephalanthidae sensu Haviland and Adininae sensu Ridsdale. Naucleeae can be subdivided into six highly supported and morphologically distinct subtribes, Breoniinae, Cephalanthinae, Corynantheinae, Naucleinae, and Mitragyninae, Uncarinae, plus one, Adininae, which is poorly supported. The relationships among these subtribes were largely unresolved. We maintain the following 22 genera: Adina, Adinauclea, Breonadia, Breonia, Burttdavya, Cephalanthus, Gyrostipula, Haldina, Janotia, Ludekia, Metadina, Mitragyna, Myrmeconauclea, Nauclea, Neolamarckia, Neonauclea, Ochreinauclea, Pausinystalia, Pertusadina, Sarcocephalus, Sinoadina, and Uncaria. Pseudocinchona is reestablished. Corynanthe is restricted to C. paniculata and Hallea is reincluded in Mitragyna. Our results were inconclusive for assessing the relationships among Adina, Adinauclea, Metadina, and Pertusadina due to lack of resolution.

  12. Quantum annealing versus classical machine learning applied to a simplified computational biology problem

    PubMed Central

    Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.

    2018-01-01

    Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to predict binding specificity. Using simplified datasets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified datasets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems. PMID:29652405

  13. Development of a PCR-based assay for rapid and reliable identification of pathogenic Fusaria.

    PubMed

    Mishra, Prashant K; Fox, Roland T V; Culham, Alastair

    2003-01-28

    Identification of Fusarium species has always been difficult due to confusing phenotypic classification systems. We have developed a fluorescent-based polymerase chain reaction assay that allows for rapid and reliable identification of five toxigenic and pathogenic Fusarium species. The species includes Fusarium avenaceum, F. culmorum, F. equiseti, F. oxysporum and F. sambucinum. The method is based on the PCR amplification of species-specific DNA fragments using fluorescent oligonucleotide primers, which were designed based on sequence divergence within the internal transcribed spacer region of nuclear ribosomal DNA. Besides providing an accurate, reliable, and quick diagnosis of these Fusaria, another advantage with this method is that it reduces the potential for exposure to carcinogenic chemicals as it substitutes the use of fluorescent dyes in place of ethidium bromide. Apart from its multidisciplinary importance and usefulness, it also obviates the need for gel electrophoresis.

  14. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    PubMed

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  15. Direct Detection and Sequencing of Damaged DNA Bases

    PubMed Central

    2011-01-01

    Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications. PMID:22185597

  16. Direct detection and sequencing of damaged DNA bases.

    PubMed

    Clark, Tyson A; Spittle, Kristi E; Turner, Stephen W; Korlach, Jonas

    2011-12-20

    Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications.

  17. The clinical maze of mitochondrial neurology

    PubMed Central

    DiMauro, Salvatore; Schon, Eric A.; Carelli, Valerio; Hirano, Michio

    2014-01-01

    Mitochondrial diseases involve the respiratory chain, which is under the dual control of nuclear and mitochondrial DNA (mtDNA). The complexity of mitochondrial genetics provides one explanation for the clinical heterogeneity of mitochondrial diseases, but our understanding of disease pathogenesis remains limited. Classification of Mendelian mitochondrial encephalomyopathies has been laborious, but whole-exome sequencing studies have revealed unexpected molecular aetiologies for both typical and atypical mitochondrial disease phenotypes. Mendelian mitochondrial defects can affect five components of mitochondrial biology: subunits of respiratory chain complexes (direct hits); mitochondrial assembly proteins; mtDNA translation; phospholipid composition of the inner mitochondrial membrane; or mitochondrial dynamics. A sixth category—defects of mtDNA maintenance—combines features of Mendelian and mitochondrial genetics. Genetic defects in mitochondrial dynamics are especially important in neurology as they cause optic atrophy, hereditary spastic paraplegia, and Charcot–Marie–Tooth disease. Therapy is inadequate and mostly palliative, but promising new avenues are being identified. Here, we review current knowledge on the genetics and pathogenesis of the six categories of mitochondrial disorders outlined above, focusing on their salient clinical manifestations and highlighting novel clinical entities. An outline of diagnostic clues for the various forms of mitochondrial disease, as well as potential therapeutic strategies, is also discussed. PMID:23835535

  18. Cut-and-Paste Transposons in Fungi with Diverse Lifestyles

    PubMed Central

    Steczkiewicz, Kamil; Ginalski, Krzysztof

    2017-01-01

    Abstract Transposable elements (TEs) shape genomes via recombination and transposition, lead to chromosomal rearrangements, create new gene neighborhoods, and alter gene expression. They play key roles in adaptation either to symbiosis in Amanita genus or to pathogenicity in Pyrenophora tritici-repentis. Despite growing evidence of their importance, the abundance and distribution of mobile elements replicating in a “cut-and-paste” fashion is barely described so far. In order to improve our knowledge on this old and ubiquitous class of transposable elements, 1,730 fungal genomes were scanned using both de novo and homology-based approaches. DNA TEs have been identified across the whole data set and display uneven distribution from both DNA TE classification and fungal taxonomy perspectives. DNA TE content correlates with genome size, which confirms that many transposon families proliferate simultaneously. In contrast, it is independent from intron density, average gene distance and GC content. TE count is associated with species’ lifestyle and tends to be elevated in plant symbionts and decreased in animal parasites. Lastly, we found that fungi with both RIP and RNAi systems have more total DNA TE sequences but less elements retaining a functional transposase, what reflects stringent control over transposition. PMID:29228286

  19. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1987-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:3575113

  20. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1990-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:2333227

  1. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1988-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:3368330

  2. A comprehensive list of cloned human DNA sequences

    PubMed Central

    Schmidtke, Jörg; Cooper, David N.

    1989-01-01

    A list of DNA sequences cloned from the human genome is presented. Intended as a guide to clone availability, this list includes published reports of cDNA, genomic and synthetic clones comprising gene and pseudogene sequences, uncharacterised DNA segments and repetitive DNA elements. PMID:2654889

  3. Extending Immunological Profiling in the Gilthead Sea Bream, Sparus aurata, by Enriched cDNA Library Analysis, Microarray Design and Initial Studies upon the Inflammatory Response to PAMPs.

    PubMed

    Boltaña, Sebastian; Castellana, Barbara; Goetz, Giles; Tort, Lluis; Teles, Mariana; Mulero, Victor; Novoa, Beatriz; Figueras, Antonio; Goetz, Frederick W; Gallardo-Escarate, Cristian; Planas, Josep V; Mackenzie, Simon

    2017-02-03

    This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ) to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST) from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS) and peptidoglycan (PGN)). Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO) showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs), carrier proteins/membrane transport (approximately 15%), effectors/modulators and cell communication (approximately 11%), nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5%) and intracellular transducers/signal transduction (approximately 5%). Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ) that provides a platform enriched for the study of gene expression in S. aurata with an emphasis upon immunity and the immune response.

  4. Kilo-sequencing: an ordered strategy for rapid DNA sequence data acquisition.

    PubMed Central

    Barnes, W M; Bevan, M

    1983-01-01

    A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA. Images PMID:6298723

  5. Evaluation of DNA barcode candidates for the discrimination of Artemisia L.

    PubMed

    Liu, Geyu; Ning, Huixia; Ayidaerhan, Nurbolati; Aisa, Haji Akber

    2017-11-01

    Because of the very similar morphologies and wide diversity of Artemisia L. varieties, they are difficult to identify, and there have been many arguments about the systematic classification Artemisia L., especially concerning the division of species. DNA barcode technology is used to rapidly identify species based on standard short DNA sequences. To evaluate seven candidate DNA barcodes (ITS, ITS2, psbA-trnH, rbcL, matK, rpoB, and rpoC1) regarding their ability to identify closely related species of the Artemisia genus in Xinjiang. The corresponding PCR amplification efficiency, detectable genetic divergence, identification efficiency and phylogenetic tree were assessed. We found that the internal transcribed spacer (ITS) region exhibited the highest interspecific divergence, which was significantly higher than the observed intraspecific variation and showed the highest identification efficiency, followed by ITS2, psbA-trnH and, finally, rpoB. matK, rbcL, and rpoC1 performed poorly in this evaluation. ITS, ITS2, and psbA-trnH were able to perfectly identify the tested species Artemisia annua, A. absinthium, A. rupestris, A. tonurnefortiana, A. austriaca, A. dracunculus, A. vulgaris, and A. macrocephala. Therefore, we propose the ITS, ITS2, and psbA-trnH regions as promising DNA barcodes for the closely related species of Artemisia L. in Xinjiang.

  6. Silicene nanoribbon as a new DNA sequencing device

    NASA Astrophysics Data System (ADS)

    Alesheikh, Sara; Shahtahmassebi, Nasser; Roknabadi, Mahmood Rezaee; Pilevar Shahri, Raheleh

    2018-02-01

    The importance of applying DNA sequencing in different fields, results in looking for fast and cheap methods. Nanotechnology helps this development by introducing nanostructures used for DNA sequencing. In this work we study the interaction between zigzag silicene nanoribbon and DNA nucleobases using DFT and non equilibrium Green's function approach, to investigate the possibility of using zigzag silicene nanoribbons as a biosensor for DNA sequencing.

  7. Structural and Functional Insights into WRKY3 and WRKY4 Transcription Factors to Unravel the WRKY–DNA (W-Box) Complex Interaction in Tomato (Solanum lycopersicum L.). A Computational Approach

    PubMed Central

    Aamir, Mohd; Singh, Vinay K.; Meena, Mukesh; Upadhyay, Ram S.; Gupta, Vijai K.; Singh, Surendra

    2017-01-01

    The WRKY transcription factors (TFs), play crucial role in plant defense response against various abiotic and biotic stresses. The role of WRKY3 and WRKY4 genes in plant defense response against necrotrophic pathogens is well-reported. However, their functional annotation in tomato is largely unknown. In the present work, we have characterized the structural and functional attributes of the two identified tomato WRKY transcription factors, WRKY3 (SlWRKY3), and WRKY4 (SlWRKY4) using computational approaches. Arabidopsis WRKY3 (AtWRKY3: NP_178433) and WRKY4 (AtWRKY4: NP_172849) protein sequences were retrieved from TAIR database and protein BLAST was done for finding their sequential homologs in tomato. Sequence alignment, phylogenetic classification, and motif composition analysis revealed the remarkable sequential variation between, these two WRKYs. The tomato WRKY3 and WRKY4 clusters with Solanum pennellii showing the monophyletic origin and evolution from their wild homolog. The functional domain region responsible for sequence specific DNA-binding occupied in both proteins were modeled [using AtWRKY4 (PDB ID:1WJ2) and AtWRKY1 (PDBID:2AYD) as template protein structures] through homology modeling using Discovery Studio 3.0. The generated models were further evaluated for their accuracy and reliability based on qualitative and quantitative parameters. The modeled proteins were found to satisfy all the crucial energy parameters and showed acceptable Ramachandran statistics when compared to the experimentally resolved NMR solution structures and/or X-Ray diffracted crystal structures (templates). The superimposition of the functional WRKY domains from SlWRKY3 and SlWRKY4 revealed remarkable structural similarity. The sequence specific DNA binding for two WRKYs was explored through DNA-protein interaction using Hex Docking server. The interaction studies found that SlWRKY4 binds with the W-box DNA through WRKYGQK with Tyr408, Arg409, and Lys419 with the initial flanking sequences also get involved in binding. In contrast, the SlWRKY3 made interaction with RKYGQK along with the residues from zinc finger motifs. Protein-protein interactions studies were done using STRING version 10.0 to explore all the possible protein partners involved in associative functional interaction networks. The Gene ontology enrichment analysis revealed the functional dimension and characterized the identified WRKYs based on their functional annotation. PMID:28611792

  8. Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

    PubMed Central

    Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

    1993-01-01

    The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943

  9. Sequence periodicity in nucleosomal DNA and intrinsic curvature.

    PubMed

    Nair, T Murlidharan

    2010-05-17

    Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.

  10. Assessing the Fidelity of Ancient DNA Sequences Amplified From Nuclear Genes

    PubMed Central

    Binladen, Jonas; Wiuf, Carsten; Gilbert, M. Thomas P.; Bunce, Michael; Barnett, Ross; Larson, Greger; Greenwood, Alex D.; Haile, James; Ho, Simon Y. W.; Hansen, Anders J.; Willerslev, Eske

    2006-01-01

    To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from environments ranging from permafrost to desert, we demonstrate the presence of miscoding lesion damage in both the mtDNA and nuDNA, resulting in insertion of erroneous bases during amplification. Interestingly, no significant differences in the frequency of miscoding lesion damage are recorded between mtDNA and nuDNA despite great differences in cellular copy numbers. For both mtDNA and nuDNA, we find significant positive correlations between total sequence heterogeneity and the rates of type 1 transitions (adenine → guanine and thymine → cytosine) and type 2 transitions (cytosine → thymine and guanine → adenine), respectively. Type 2 transitions are by far the most dominant and increase relative to those of type 1 with damage load. The results suggest that the deamination of cytosine (and 5-methyl cytosine) to uracil (and thymine) is the main cause of miscoding lesions in both ancient mtDNA and nuDNA sequences. We argue that the problems presented by postmortem damage, as well as problems with contamination from exogenous sources of conserved nuclear genes, allelic variation, and the reliance on single nucleotide polymorphisms, call for great caution in studies relying on ancient nuDNA sequences. PMID:16299392

  11. [Current applications of high-throughput DNA sequencing technology in antibody drug research].

    PubMed

    Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong

    2012-03-01

    Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.

  12. PACE: Probabilistic Assessment for Contributor Estimation- A machine learning-based assessment of the number of contributors in DNA mixtures.

    PubMed

    Marciano, Michael A; Adelman, Jonathan D

    2017-03-01

    The deconvolution of DNA mixtures remains one of the most critical challenges in the field of forensic DNA analysis. In addition, of all the data features required to perform such deconvolution, the number of contributors in the sample is widely considered the most important, and, if incorrectly chosen, the most likely to negatively influence the mixture interpretation of a DNA profile. Unfortunately, most current approaches to mixture deconvolution require the assumption that the number of contributors is known by the analyst, an assumption that can prove to be especially faulty when faced with increasingly complex mixtures of 3 or more contributors. In this study, we propose a probabilistic approach for estimating the number of contributors in a DNA mixture that leverages the strengths of machine learning. To assess this approach, we compare classification performances of six machine learning algorithms and evaluate the model from the top-performing algorithm against the current state of the art in the field of contributor number classification. Overall results show over 98% accuracy in identifying the number of contributors in a DNA mixture of up to 4 contributors. Comparative results showed 3-person mixtures had a classification accuracy improvement of over 6% compared to the current best-in-field methodology, and that 4-person mixtures had a classification accuracy improvement of over 20%. The Probabilistic Assessment for Contributor Estimation (PACE) also accomplishes classification of mixtures of up to 4 contributors in less than 1s using a standard laptop or desktop computer. Considering the high classification accuracy rates, as well as the significant time commitment required by the current state of the art model versus seconds required by a machine learning-derived model, the approach described herein provides a promising means of estimating the number of contributors and, subsequently, will lead to improved DNA mixture interpretation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  13. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    PubMed

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  14. Mammalian DNA enriched for replication origins is enriched for snap-back sequences.

    PubMed

    Zannis-Hadjopoulos, M; Kaufmann, G; Martin, R G

    1984-11-15

    Using the instability of replication loops as a method for the isolation of double-stranded nascent DNA, extruded DNA enriched for replication origins was obtained and denatured. Snap-back DNA, single-stranded DNA with inverted repeats (palindromic sequences), reassociates rapidly into stem-loop structures with zero-order kinetics when conditions are changed from denaturing to renaturing, and can be assayed by chromatography on hydroxyapatite. Origin-enriched nascent DNA strands from mouse, rat and monkey cells growing either synchronously or asynchronously were purified and assayed for the presence of snap-back sequences. The results show that origin-enriched DNA is also enriched for snap-back sequences, implying that some origins for mammalian DNA replication contain or lie near palindromic sequences.

  15. Whole genome amplification approach reveals novel polyhydroxyalkanoate synthases (PhaCs) from Japan Trench and Nankai Trough seawater.

    PubMed

    Foong, Choon Pin; Lau, Nyok-Sean; Deguchi, Shigeru; Toyofuku, Takashi; Taylor, Todd D; Sudesh, Kumar; Matsui, Minami

    2014-12-24

    Special features of the Japanese ocean include its ranges of latitude and depth. This study is the first to examine the diversity of Class I and II PHA synthases (PhaC) in DNA samples from pelagic seawater taken from the Japan Trench and Nankai Trough from a range of depths from 24 m to 5373 m. PhaC is the key enzyme in microorganisms that determines the types of monomer units that are polymerized into polyhydroxyalkanoate (PHA) and thus affects the physicochemical properties of this thermoplastic polymer. Complete putative PhaC sequences were determined via genome walking, and the activities of newly discovered PhaCs were evaluated in a heterologous host. A total of 76 putative phaC PCR fragments were amplified from the whole genome amplified seawater DNA. Of these 55 clones contained conserved PhaC domains and were classified into 20 genetic groups depending on their sequence similarity. Eleven genetic groups have undisclosed PhaC activity based on their distinct phylogenetic lineages from known PHA producers. Three complete DNA coding sequences were determined by IAN-PCR, and one PhaC was able to produce poly(3-hydroxybutyrate) in recombinant Cupriavidus necator PHB-4 (PHB-negative mutant). A new functional PhaC that has close identity to Marinobacter sp. was discovered in this study. Phylogenetic classification for all the phaC genes isolated from uncultured bacteria has revealed that seawater and other environmental resources harbor a great diversity of PhaCs with activities that have not yet been investigated. Functional evaluation of these in silico-based PhaCs via genome walking has provided new insights into the polymerizing ability of these enzymes.

  16. Massively parallel sequencing analysis of mucinous ovarian carcinomas: genomic profiling and differential diagnoses.

    PubMed

    Mueller, Jennifer J; Schlappe, Brooke A; Kumar, Rahul; Olvera, Narciso; Dao, Fanny; Abu-Rustum, Nadeem; Aghajanian, Carol; DeLair, Deborah; Hussein, Yaser R; Soslow, Robert A; Levine, Douglas A; Weigelt, Britta

    2018-05-21

    Mucinous ovarian cancer (MOC) is a rare type of epithelial ovarian cancer resistant to standard chemotherapy regimens. We sought to characterize the repertoire of somatic mutations in MOCs and to define the contribution of massively parallel sequencing to the classification of tumors diagnosed as primary MOCs. Following gynecologic pathology and chart review, DNA samples obtained from primary MOCs and matched normal tissues/blood were subjected to whole-exome (n = 9) or massively parallel sequencing targeting 341 cancer genes (n = 15). Immunohistochemical analysis of estrogen receptor, progesterone receptor, PTEN, ARID1A/BAF250a, and the DNA mismatch (MMR) proteins MSH6 and PMS2 was performed for all cases. Mutational frequencies of MOCs were compared to those of high-grade serous ovarian cancers (HGSOCs) and mucinous tumors from other sites. MOCs were heterogeneous at the genetic level, frequently harboring TP53 (75%) mutations, KRAS (71%) mutations and/or CDKN2A/B homozygous deletions/mutations (33%). Although established criteria for diagnosis were employed, four cases harbored mutational and immunohistochemical profiles similar to those of endometrioid carcinomas, and one case for colorectal or endometrioid carcinoma. Significant differences in the frequencies of KRAS, TP53, CDKN2A, FBXW7, PIK3CA and/or APC mutations between the confirmed primary MOCs (n = 19) and HGSOCs, mucinous gastric and/or mucinous colorectal carcinomas were found, whereas no differences in the 341 genes studied between MOCs and mucinous pancreatic carcinomas were identified. Our findings suggest that the assessment of mutations affecting TP53, KRAS, PIK3CA, ARID1A and POLE, and DNA MMR protein expression may be used to further aid the diagnosis and treatment decision-making of primary MOC. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. DNA sequence determinants controlling affinity, stability and shape of DNA complexes bound by the nucleoid protein Fis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hancock, Stephen P.; Stella, Stefano; Cascio, Duilio

    The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequencesmore » in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. Lastly, the affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.« less

  18. DNA sequence determinants controlling affinity, stability and shape of DNA complexes bound by the nucleoid protein Fis

    DOE PAGES

    Hancock, Stephen P.; Stella, Stefano; Cascio, Duilio; ...

    2016-03-09

    The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequencesmore » in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. Lastly, the affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.« less

  19. DNA sequence characterisation and phylogeography of Lymnaea cousini and related species, vectors of fascioliasis in northern Andean countries, with description of L. meridensis n. sp. (Gastropoda: Lymnaeidae)

    PubMed Central

    2011-01-01

    Background Livestock fascioliasis is a problem throughout Ecuador, Colombia and Venezuela, mainly in Andean areas where the disease also appears to affect humans. Transmission patterns and epidemiological scenarios of liver fluke infection have shown to differ according to the lymnaeid vector snail species involved. These Andean countries present the vectors Lymnaea cousini, L. bogotensis and L. ubaquensis, unknown in the rest of Latin America. An exhaustive combined haplotype study of these species is performed by means of DNA sequencing of the nuclear ribosomal 18S RNA gene, ITS-2 and ITS-1, and mitochondrial DNA cox1 gene. Results The conserved 5.8S rDNA sequence corroborated that no pseudogenes are involved in the numerous non-microsatellite/minisatellite-related indels appearing between the ITS-2 and ITS-1 sequences when comparing different L. cousini - L. bogotensis populations. Sequence analyses and phylogenetic reconstruction methods including other lymnaeid vector species show that (i) L. bogotensis is a synonym of L. cousini, (ii) L. ubaquensis is a synonym of Pseudosuccinea columella, and (iii) populations of L. cousini hitherto known from Venezuelan highlands indeed belong to a new species for which the name L. meridensis n. sp. is proposed. This new species is described and a complete phenotypic differentiation provided. Conclusions ITS-2, ITS-1 and cox1 prove to be good markers for specimen classification and haplotype characterisation of these morphologically similar lymnaeids in endemic areas. Analysis of the 18S gene and phylogenetic reconstructions indicate that L. cousini and L. meridensis n. sp. cluster in an evolutionary line different from the one of P. columella, despite their external resemblance. This suggests an evolutionary phenotypic convergence related to similar environments and which has given rise to frequent specimen misclassification. Body size and phylogenetic relationships of L. meridensis n. sp. with well-known vectors as Lymnaea cousini and P. columella, as well as with Galba/Fossaria species, suggest that the new species may participate in disease transmission to both animals and humans in altitude areas during the yearly window in which temperatures are higher than the F. hepatica minimum development threshold. The involvement of L. cousini and P. columella in the transmission and geographical/altitudinal distribution of fascioliasis in these Andean countries is analysed. PMID:21749718

  20. Specific minor groove solvation is a crucial determinant of DNA binding site recognition

    PubMed Central

    Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.

    2014-01-01

    The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976

  1. A Method for Preparing DNA Sequencing Templates Using a DNA-Binding Microplate

    PubMed Central

    Yang, Yu; Hebron, Haroun R.; Hang, Jun

    2009-01-01

    A DNA-binding matrix was immobilized on the surface of a 96-well microplate and used for plasmid DNA preparation for DNA sequencing. The same DNA-binding plate was used for bacterial growth, cell lysis, DNA purification, and storage. In a single step using one buffer, bacterial cells were lysed by enzymes, and released DNA was captured on the plate simultaneously. After two wash steps, DNA was eluted and stored in the same plate. Inclusion of phosphates in the culture medium was found to enhance the yield of plasmid significantly. Purified DNA samples were used successfully in DNA sequencing with high consistency and reproducibility. Eleven vectors and nine libraries were tested using this method. In 10 μl sequencing reactions using 3 μl sample and 0.25 μl BigDye Terminator v3.1, the results from a 3730xl sequencer gave a success rate of 90–95% and read-lengths of 700 bases or more. The method is fully automatable and convenient for manual operation as well. It enables reproducible, high-throughput, rapid production of DNA with purity and yields sufficient for high-quality DNA sequencing at a substantially reduced cost. PMID:19568455

  2. Dendritic Cell-Based Immunotherapy of Breast Cancer: Modulation by CpG DNA

    DTIC Science & Technology

    2005-09-01

    tumor-associated antigens and bacterial DNA oligodeoxynucleotides containing unmethylated CpG sequences (CpG DNA) further augment the immune priming...associated antigens by cytotoxic T lymphocytes, and bacterial DNA oligodeoxy- nucleotides containing unmethylated CpG sequences (CpG DNA) can further...further amplify their immunostimulatory capacity and bacterial DNA oligodeoxynucleotides (ODN) containing unmethylated CpG sequences (CpG DNA) provide such

  3. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

    PubMed

    Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J

    2018-05-17

    Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

  4. A rapid and cost-effective method for sequencing pooled cDNA clones by using a combination of transposon insertion and Gateway technology.

    PubMed

    Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide

    2011-09-01

    Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.

  5. Biological sequence compression algorithms.

    PubMed

    Matsumoto, T; Sadakane, K; Imai, H

    2000-01-01

    Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.

  6. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    PubMed

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  7. Single-Molecule Electrical Random Resequencing of DNA and RNA

    NASA Astrophysics Data System (ADS)

    Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

    2012-07-01

    Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.

  8. The oligonucleotide frequency derived error gradient and its application to the binning of metagenome fragments

    PubMed Central

    2009-01-01

    Background The characterisation, or binning, of metagenome fragments is an important first step to further downstream analysis of microbial consortia. Here, we propose a one-dimensional signature, OFDEG, derived from the oligonucleotide frequency profile of a DNA sequence, and show that it is possible to obtain a meaningful phylogenetic signal for relatively short DNA sequences. The one-dimensional signal is essentially a compact representation of higher dimensional feature spaces of greater complexity and is intended to improve on the tetranucleotide frequency feature space preferred by current compositional binning methods. Results We compare the fidelity of OFDEG against tetranucleotide frequency in both an unsupervised and semi-supervised setting on simulated metagenome benchmark data. Four tests were conducted using assembler output of Arachne and phrap, and for each, performance was evaluated on contigs which are greater than or equal to 8 kbp in length and contigs which are composed of at least 10 reads. Using both G-C content in conjunction with OFDEG gave an average accuracy of 96.75% (semi-supervised) and 95.19% (unsupervised), versus 94.25% (semi-supervised) and 82.35% (unsupervised) for tetranucleotide frequency. Conclusion We have presented an observation of an alternative characteristic of DNA sequences. The proposed feature representation has proven to be more beneficial than the existing tetranucleotide frequency space to the metagenome binning problem. We do note, however, that our observation of OFDEG deserves further anlaysis and investigation. Unsupervised clustering revealed OFDEG related features performed better than standard tetranucleotide frequency in representing a relevant organism specific signal. Further improvement in binning accuracy is given by semi-supervised classification using OFDEG. The emphasis on a feature-driven, bottom-up approach to the problem of binning reveals promising avenues for future development of techniques to characterise short environmental sequences without bias toward cultivable organisms. PMID:19958473

  9. Molecular phylogenetics and character evolution of morphologically diverse groups, Dendrobium section Dendrobium and allies

    PubMed Central

    Takamiya, Tomoko; Wongsawad, Pheravut; Sathapattayanon, Apirada; Tajima, Natsuko; Suzuki, Shunichiro; Kitamura, Saki; Shioda, Nao; Handa, Takashi; Kitanaka, Susumu; Iijima, Hiroshi; Yukawa, Tomohisa

    2014-01-01

    It is always difficult to construct coherent classification systems for plant lineages having diverse morphological characters. The genus Dendrobium, one of the largest genera in the Orchidaceae, includes ∼1100 species, and enormous morphological diversification has hindered the establishment of consistent classification systems covering all major groups of this genus. Given the particular importance of species in Dendrobium section Dendrobium and allied groups as floriculture and crude drug genetic resources, there is an urgent need to establish a stable classification system. To clarify phylogenetic relationships in Dendrobium section Dendrobium and allied groups, we analysed the macromolecular characters of the group. Phylogenetic analyses of 210 taxa of Dendrobium were conducted on DNA sequences of internal transcribed spacer (ITS) regions of 18S–26S nuclear ribosomal DNA and the maturase-coding gene (matK) located in an intron of the plastid gene trnK using maximum parsimony and Bayesian methods. The parsimony and Bayesian analyses revealed 13 distinct clades in the group comprising section Dendrobium and its allied groups. Results also showed paraphyly or polyphyly of sections Amblyanthus, Aporum, Breviflores, Calcarifera, Crumenata, Dendrobium, Densiflora, Distichophyllae, Dolichocentrum, Holochrysa, Oxyglossum and Pedilonum. On the other hand, the monophyly of section Stachyobium was well supported. It was found that many of the morphological characters that have been believed to reflect phylogenetic relationships are, in fact, the result of convergence. As such, many of the sections that have been recognized up to this point were found to not be monophyletic, so recircumscription of sections is required. PMID:25107672

  10. Implementation of Objective PASC-Derived Taxon Demarcation Criteria for Official Classification of Filoviruses.

    PubMed

    Bào, Yīmíng; Amarasinghe, Gaya K; Basler, Christopher F; Bavari, Sina; Bukreyev, Alexander; Chandran, Kartik; Dolnik, Olga; Dye, John M; Ebihara, Hideki; Formenty, Pierre; Hewson, Roger; Kobinger, Gary P; Leroy, Eric M; Mühlberger, Elke; Netesov, Sergey V; Patterson, Jean L; Paweska, Janusz T; Smither, Sophie J; Takada, Ayato; Towner, Jonathan S; Volchkov, Viktor E; Wahl-Jensen, Victoria; Kuhn, Jens H

    2017-05-11

    The mononegaviral family Filoviridae has eight members assigned to three genera and seven species. Until now, genus and species demarcation were based on arbitrarily chosen filovirus genome sequence divergence values (≈50% for genera, ≈30% for species) and arbitrarily chosen phenotypic virus or virion characteristics. Here we report filovirus genome sequence-based taxon demarcation criteria using the publicly accessible PAirwise Sequencing Comparison (PASC) tool of the US National Center for Biotechnology Information (Bethesda, MD, USA). Comparison of all available filovirus genomes in GenBank using PASC revealed optimal genus demarcation at the 55-58% sequence diversity threshold range for genera and at the 23-36% sequence diversity threshold range for species. Because these thresholds do not change the current official filovirus classification, these values are now implemented as filovirus taxon demarcation criteria that may solely be used for filovirus classification in case additional data are absent. A near-complete, coding-complete, or complete filovirus genome sequence will now be required to allow official classification of any novel "filovirus." Classification of filoviruses into existing taxa or determining the need for novel taxa is now straightforward and could even become automated using a presented algorithm/flowchart rooted in RefSeq (type) sequences.

  11. DNA/RNA hybrid substrates modulate the catalytic activity of purified AID.

    PubMed

    Abdouni, Hala S; King, Justin J; Ghorbani, Atefeh; Fifield, Heather; Berghuis, Lesley; Larijani, Mani

    2018-01-01

    Activation-induced cytidine deaminase (AID) converts cytidine to uridine at Immunoglobulin (Ig) loci, initiating somatic hypermutation and class switching of antibodies. In vitro, AID acts on single stranded DNA (ssDNA), but neither double-stranded DNA (dsDNA) oligonucleotides nor RNA, and it is believed that transcription is the in vivo generator of ssDNA targeted by AID. It is also known that the Ig loci, particularly the switch (S) regions targeted by AID are rich in transcription-generated DNA/RNA hybrids. Here, we examined the binding and catalytic behavior of purified AID on DNA/RNA hybrid substrates bearing either random sequences or GC-rich sequences simulating Ig S regions. If substrates were made up of a random sequence, AID preferred substrates composed entirely of DNA over DNA/RNA hybrids. In contrast, if substrates were composed of S region sequences, AID preferred to mutate DNA/RNA hybrids over substrates composed entirely of DNA. Accordingly, AID exhibited a significantly higher affinity for binding DNA/RNA hybrid substrates composed specifically of S region sequences, than any other substrates composed of DNA. Thus, in the absence of any other cellular processes or factors, AID itself favors binding and mutating DNA/RNA hybrids composed of S region sequences. AID:DNA/RNA complex formation and supporting mutational analyses suggest that recognition of DNA/RNA hybrids is an inherent structural property of AID. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.

    PubMed

    Schnitzler, P; Darai, G

    1989-09-01

    The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.

  13. Long-range correlations and charge transport properties of DNA sequences

    NASA Astrophysics Data System (ADS)

    Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

    2010-04-01

    By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5

  14. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    PubMed

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  15. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

    PubMed

    Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  16. Proline: The Distribution, Frequency, Positioning, and Common Functional Roles of Proline and Polyproline Sequences in the Human Proteome

    PubMed Central

    Morgan, Alexander A.; Rubenstein, Edward

    2013-01-01

    Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained phi angle. Sequences of three consecutive prolines can fold into polyproline helices, structures that join alpha helices and beta pleats as architectural motifs in protein configuration. Triproline helices are participants in protein-protein signaling interactions. Longer spans of repeat prolines also occur, containing as many as 27 consecutive proline residues. Little is known about the frequency, positioning, and functional significance of these proline sequences. Therefore we have undertaken a systematic bioinformatics study of proline residues in proteins. We analyzed the distribution and frequency of 687,434 proline residues among 18,666 human proteins, identifying single residues, dimers, trimers, and longer repeats. Proline accounts for 6.3% of the 10,882,808 protein amino acids. Of all proline residues, 4.4% are in trimers or longer spans. We detected patterns that influence function based on proline location, spacing, and concentration. We propose a classification based on proline-rich, polyproline-rich, and proline-poor status. Whereas singlet proline residues are often found in proteins that display recurring architectural patterns, trimers or longer proline sequences tend be associated with the absence of repetitive structural motifs. Spans of 6 or more are associated with DNA/RNA processing, actin, and developmental processes. We also suggest a role for proline in Kruppel-type zinc finger protein control of DNA expression, and in the nucleation and translocation of actin by the formin complex. PMID:23372670

  17. Sequence periodicity in nucleosomal DNA and intrinsic curvature

    PubMed Central

    2010-01-01

    Background Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Results Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. Conclusions The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA. PMID:20487515

  18. A survey of the sequence-specific interaction of damaging agents with DNA: emphasis on antitumor agents.

    PubMed

    Murray, V

    1999-01-01

    This article reviews the literature concerning the sequence specificity of DNA-damaging agents. DNA-damaging agents are widely used in cancer chemotherapy. It is important to understand fully the determinants of DNA sequence specificity so that more effective DNA-damaging agents can be developed as antitumor drugs. There are five main methods of DNA sequence specificity analysis: cleavage of end-labeled fragments, linear amplification with Taq DNA polymerase, ligation-mediated polymerase chain reaction (PCR), single-strand ligation PCR, and footprinting. The DNA sequence specificity in purified DNA and in intact mammalian cells is reviewed for several classes of DNA-damaging agent. These include agents that form covalent adducts with DNA, free radical generators, topoisomerase inhibitors, intercalators and minor groove binders, enzymes, and electromagnetic radiation. The main sites of adduct formation are at the N-7 of guanine in the major groove of DNA and the N-3 of adenine in the minor groove, whereas free radical generators abstract hydrogen from the deoxyribose sugar and topoisomerase inhibitors cause enzyme-DNA cross-links to form. Several issues involved in the determination of the DNA sequence specificity are discussed. The future directions of the field, with respect to cancer chemotherapy, are also examined.

  19. Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing

    PubMed Central

    Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-01-01

    Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039

  20. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    PubMed

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  1. Sequence editing by Apolipoprotein B RNA-editing catalytic component-B and epidemiological surveillance of transmitted HIV-1 drug resistance

    PubMed Central

    Gifford, Robert J.; Rhee, Soo-Yon; Eriksson, Nicolas; Liu, Tommy F.; Kiuchi, Mark; Das, Amar K.; Shafer, Robert W.

    2008-01-01

    Design Promiscuous guanine (G) to adenine (A) substitutions catalysed by apolipoprotein B RNA-editing catalytic component (APOBEC) enzymes are observed in a proportion of HIV-1 sequences in vivo and can introduce artifacts into some genetic analyses. The potential impact of undetected lethal editing on genotypic estimation of transmitted drug resistance was assessed. Methods Classifiers of lethal, APOBEC-mediated editing were developed by analysis of lentiviral pol gene sequence variation and evaluated using control sets of HIV-1 sequences. The potential impact of sequence editing on genotypic estimation of drug resistance was assessed in sets of sequences obtained from 77 studies of 25 or more therapy-naive individuals, using mixture modelling approaches to determine the maximum likelihood classification of sequences as lethally edited as opposed to viable. Results Analysis of 6437 protease and reverse transcriptase sequences from therapy-naive individuals using a novel classifier of lethal, APOBEC3G-mediated sequence editing, the polypeptide-like 3G (APOBEC3G)-mediated defectives (A3GD) index’, detected lethal editing in association with spurious ‘transmitted drug resistance’ in nearly 3% of proviral sequences obtained from whole blood and 0.2% of samples obtained from plasma. Conclusion Screening for lethally edited sequences in datasets containing a proportion of proviral DNA, such as those likely to be obtained for epidemiological surveillance of transmitted drug resistance in the developing world, can eliminate rare but potentially significant errors in genotypic estimation of transmitted drug resistance. PMID:18356601

  2. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.

    PubMed

    Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew

    2017-11-06

    Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.

  3. An evolution based biosensor receptor DNA sequence generation algorithm.

    PubMed

    Kim, Eungyeong; Lee, Malrey; Gatton, Thomas M; Lee, Jaewan; Zang, Yupeng

    2010-01-01

    A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP) approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements.

  4. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

    PubMed Central

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611

  5. Structural and Thermodynamic Signatures of DNA Recognition by Mycobacterium tuberculosis DnaA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tsodikov, Oleg V.; Biswas, Tapan

    An essential protein, DnaA, binds to 9-bp DNA sites within the origin of replication oriC. These binding events are prerequisite to forming an enigmatic nucleoprotein scaffold that initiates replication. The number, sequences, positions, and orientations of these short DNA sites, or DnaA boxes, within the oriCs of different bacteria vary considerably. To investigate features of DnaA boxes that are important for binding Mycobacterium tuberculosis DnaA (MtDnaA), we have determined the crystal structures of the DNA binding domain (DBD) of MtDnaA bound to a cognate MtDnaA-box (at 2.0 {angstrom} resolution) and to a consensus Escherichia coli DnaA-box (at 2.3 {angstrom}). Thesemore » structures, complemented by calorimetric equilibrium binding studies of MtDnaA DBD in a series of DnaA-box variants, reveal the main determinants of DNA recognition and establish the [T/C][T/A][G/A]TCCACA sequence as a high-affinity MtDnaA-box. Bioinformatic and calorimetric analyses indicate that DnaA-box sequences in mycobacterial oriCs generally differ from the optimal binding sequence. This sequence variation occurs commonly at the first 2 bp, making an in vivo mycobacterial DnaA-box effectively a 7-mer and not a 9-mer. We demonstrate that the decrease in the affinity of these MtDnaA-box variants for MtDnaA DBD relative to that of the highest-affinity box TTGTCCACA is less than 10-fold. The understanding of DnaA-box recognition by MtDnaA and E. coli DnaA enables one to map DnaA-box sequences in the genomes of M. tuberculosis and other eubacteria.« less

  6. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  7. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  8. Comparison of Nested PCR and RFLP for Identification and Classification of Malassezia Yeasts from Healthy Human Skin

    PubMed Central

    Oh, Byung Ho; Song, Young Chan; Choe, Yong Beom; Ahn, Kyu Joong

    2009-01-01

    Background Malassezia yeasts are normal flora of the skin found in 75~98% of healthy subjects. The accurate identification of the Malassezia species is important for determining the pathogenesis of the Malassezia yeasts with regard to various skin diseases such as Malassezia folliculitis, seborrheic dermatitis, and atopic dermatitis. Objective This research was conducted to determine a more accurate and rapid molecular test for the identification and classification of Malassezia yeasts. Methods We compared the accuracy and efficacy of restriction fragment length polymorphism (RFLP) and the nested polymerase chain reaction (PCR) for the identification of Malassezia yeasts. Results Although both methods demonstrated rapid and reliable results with regard to identification, the nested PCR method was faster. However, 7 different Malassezia species (1.2%) were identified by the nested PCR compared to the RFLP method. Conclusion Our results show that RFLP method was relatively more accurate and reliable for the detection of various Malassezia species compared to the nested PCR. But, in the aspect of simplicity and time saving, the latter method has its own advantages. In addition, the 26S rDNA, which was targeted in this study, contains highly conserved base sequences and enough sequence variation for inter-species identification of Malassezia yeasts. PMID:20523823

  9. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria.

    PubMed

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; You, Zhi-Qiang; Roux, Simon; Sullivan, Matthew B

    2017-01-01

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome 'sequence space' that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.

  10. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    PubMed Central

    Doulcier, Guilhem; You, Zhi-Qiang; Roux, Simon

    2017-01-01

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure. PMID:28480138

  11. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less

  12. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    DOE PAGES

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; ...

    2017-05-03

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less

  13. Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST‐Indel)

    PubMed Central

    Douville, Christopher; Masica, David L.; Stenson, Peter D.; Cooper, David N.; Gygax, Derek M.; Kim, Rick; Ryan, Michael

    2015-01-01

    ABSTRACT Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features—DNA and protein sequence conservation, indel length, and occurrence in repeat regions—are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in‐frame and frameshift indels (VEST‐indel) as pathogenic or benign. We apply 24 features, including a new “PubMed” feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false‐positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta‐predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta‐predictor with improved performance over any individual method. PMID:26442818

  14. Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel).

    PubMed

    Douville, Christopher; Masica, David L; Stenson, Peter D; Cooper, David N; Gygax, Derek M; Kim, Rick; Ryan, Michael; Karchin, Rachel

    2016-01-01

    Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method. © 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.

  15. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  16. Simrank: Rapid and sensitive general-purpose k-mer search tool

    PubMed Central

    2011-01-01

    Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. PMID:21524302

  17. Re-classification of Clavibacter michiganensis subspecies on the basis of whole-genome and multi-locus sequence analyses

    PubMed Central

    Li, Xiang; Tambong, James; Yuan, Kat (Xiaoli); Chen, Wen; Xu, Huimin; Lévesque, C. André; De Boer, Solke H.

    2018-01-01

    Although the genus Clavibacter was originally proposed to accommodate all phytopathogenic coryneform bacteria containing B2γ diaminobutyrate in the peptidoglycan, reclassification of all but one species into other genera has resulted in the current monospecific status of the genus. The single species in the genus, Clavibacter michiganensis, has multiple subspecies, which are all highly host-specific plant pathogens. Whole genome analysis based on average nucleotide identity and digital DNA–DNA hybridization as well as multi-locus sequence analysis (MLSA) of seven housekeeping genes support raising each of the C. michiganensis subspecies to species status. On the basis of whole genome and MLSA data, we propose the establishment of two new species and three new combinations: Clavibacter capsici sp. nov., comb. nov. and Clavibacter tessellarius sp. nov., comb. nov., and Clavibacter insidiosus comb. nov., Clavibacter nebraskensis comb. nov. and Clavibacter sepedonicus comb. nov. PMID:29160202

  18. Dna Sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  19. High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients.

    PubMed

    Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya

    2015-08-01

    Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. Sequence-Dependent Diastereospecific and Diastereodivergent Crosslinking of DNA by Decarbamoylmitomycin C.

    PubMed

    Aguilar, William; Paz, Manuel M; Vargas, Anayatzinc; Clement, Cristina C; Cheng, Shu-Yuan; Champeil, Elise

    2018-04-20

    Mitomycin C (MC), a potent antitumor drug, and decarbamoylmitomycin C (DMC), a derivative lacking the carbamoyl group, form highly cytotoxic DNA interstrand crosslinks. The major interstrand crosslink formed by DMC is the C1'' epimer of the major crosslink formed by MC. The molecular basis for the stereochemical configuration exhibited by DMC was investigated using biomimetic synthesis. The formation of DNA-DNA crosslinks by DMC is diastereospecific and diastereodivergent: Only the 1''S-diastereomer of the initially formed monoadduct can form crosslinks at GpC sequences, and only the 1''R-diastereomer of the monoadduct can form crosslinks at CpG sequences. We also show that CpG and GpC sequences react with divergent diastereoselectivity in the first alkylation step: 1"S stereochemistry is favored at GpC sequences and 1''R stereochemistry is favored at CpG sequences. Therefore, the first alkylation step results, at each sequence, in the selective formation of the diastereomer able to generate an interstrand DNA-DNA crosslink after the "second arm" alkylation. Examination of the known DNA adduct pattern obtained after treatment of cancer cell cultures with DMC indicates that the GpC sequence is the major target for the formation of DNA-DNA crosslinks in vivo by this drug. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Sequencing historical specimens: successful preparation of small specimens with low amounts of degraded DNA.

    PubMed

    Sproul, John S; Maddison, David R

    2017-11-01

    Despite advances that allow DNA sequencing of old museum specimens, sequencing small-bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmented DNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small-bodied (3-6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58-159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of input DNA (1-10 ng). We also explored low-cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimal DNA, such as enzymatic repair of DNA. We report successful sample preparation and sequencing for all historical specimens despite our low-input DNA approach. We provide a list of guidelines related to DNA repair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuable DNA and enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens. © 2017 John Wiley & Sons Ltd.

  2. i-rDNA: alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets.

    PubMed

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S

    2011-11-30

    Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/

  3. Biosensors for DNA sequence detection

    NASA Technical Reports Server (NTRS)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  4. DNA Sequences from Formalin-Fixed Nematodes: Integrating Molecular and Morphological Approaches to Taxonomy

    PubMed Central

    Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.

    1997-01-01

    To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156

  5. Palindromic Sequence Artifacts Generated during Next Generation Sequencing Library Preparation from Historic and Ancient DNA

    PubMed Central

    Star, Bastiaan; Nederbragt, Alexander J.; Hansen, Marianne H. S.; Skage, Morten; Gilfillan, Gregor D.; Bradbury, Ian R.; Pampoulie, Christophe; Stenseth, Nils Chr; Jakobsen, Kjetill S.; Jentoft, Sissel

    2014-01-01

    Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua), which forms interrupted palindromes consisting of reverse complementary sequence at the 5′ and 3′-ends of sequencing reads. The palindromic sequences themselves have specific properties – the bases at the 5′-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3′-end. The terminal 3′ bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA), which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3′-end of DNA strands, with the 5′-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA) datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias. PMID:24608104

  6. A new family of satellite DNA sequences as a major component of centromeric heterochromatin in owls (Strigiformes).

    PubMed

    Yamada, Kazuhiko; Nishida-Umehara, Chizuko; Matsuda, Yoichi

    2004-03-01

    We isolated a new family of satellite DNA sequences from HaeIII- and EcoRI-digested genomic DNA of the Blakiston's fish owl ( Ketupa blakistoni). The repetitive sequences were organized in tandem arrays of the 174 bp element, and localized to the centromeric regions of all macrochromosomes, including the Z and W chromosomes, and microchromosomes. This hybridization pattern was consistent with the distribution of C-band-positive centromeric heterochromatin, and the satellite DNA sequences occupied 10% of the total genome as a major component of centromeric heterochromatin. The sequences were homogenized between macro- and microchromosomes in this species, and therefore intraspecific divergence of the nucleotide sequences was low. The 174 bp element cross-hybridized to the genomic DNA of six other Strigidae species, but not to that of the Tytonidae, suggesting that the satellite DNA sequences are conserved in the same family but fairly divergent between the different families in the Strigiformes. Secondly, the centromeric satellite DNAs were cloned from eight Strigidae species, and the nucleotide sequences of 41 monomer fragments were compared within and between species. Molecular phylogenetic relationships of the nucleotide sequences were highly correlated with both the taxonomy based on morphological traits and the phylogenetic tree constructed by DNA-DNA hybridization. These results suggest that the satellite DNA sequence has evolved by concerted evolution in the Strigidae and that it is a good taxonomic and phylogenetic marker to examine genetic diversity between Strigiformes species.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sobottka, Marcelo, E-mail: sobottka@mtm.ufsc.br; Hart, Andrew G., E-mail: ahart@dim.uchile.cl

    Highlights: {yields} We propose a simple stochastic model to construct primitive DNA sequences. {yields} The model provide an explanation for Chargaff's second parity rule in primitive DNA sequences. {yields} The model is also used to predict a novel type of strand symmetry in primitive DNA sequences. {yields} We extend the results for bacterial DNA sequences and compare distributional properties intrinsic to the model to statistical estimates from 1049 bacterial genomes. {yields} We find out statistical evidences that the novel type of strand symmetry holds for bacterial DNA sequences. -- Abstract: Chargaff's second parity rule for short oligonucleotides states that themore » frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff's first parity rule is fully explained by the Watson-Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff's second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.« less

  8. A Simulation of DNA Sequencing Utilizing 3M Post-It[R] Notes

    ERIC Educational Resources Information Center

    Christensen, Doug

    2009-01-01

    An inexpensive and equipment free approach to teaching the technical aspects of DNA sequencing. The activity described requires an instructor with a familiarity of DNA sequencing technology but provides a straight forward method of teaching the technical aspects of sequencing in the absence of expensive sequencing equipment. The final sequence…

  9. DNA and RNA sequencing by nanoscale reading through programmable electrophoresis and nanoelectrode-gated tunneling and dielectric detection

    DOEpatents

    Lee, James W.; Thundat, Thomas G.

    2005-06-14

    An apparatus and method for performing nucleic acid (DNA and/or RNA) sequencing on a single molecule. The genetic sequence information is obtained by probing through a DNA or RNA molecule base by base at nanometer scale as though looking through a strip of movie film. This DNA sequencing nanotechnology has the theoretical capability of performing DNA sequencing at a maximal rate of about 1,000,000 bases per second. This enhanced performance is made possible by a series of innovations including: novel applications of a fine-tuned nanometer gap for passage of a single DNA or RNA molecule; thin layer microfluidics for sample loading and delivery; and programmable electric fields for precise control of DNA or RNA movement. Detection methods include nanoelectrode-gated tunneling current measurements, dielectric molecular characterization, and atomic force microscopy/electrostatic force microscopy (AFM/EFM) probing for nanoscale reading of the nucleic acid sequences.

  10. The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

    PubMed

    Khoe, Clairine V; Chung, Long H; Murray, Vincent

    2018-06-01

    The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  11. Assessing fluctuating evolutionary pressure in yeast and mammal evolutionary rate covariation using bioinformatics of meiotic protein genetic sequences

    NASA Astrophysics Data System (ADS)

    Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Holden, T.; Lieberman, D.; Cheung, T.

    2013-09-01

    The evolutionary rate co-variation in meiotic proteins has been reported for yeast and mammal using phylogenic branch lengths which assess retention, duplication and mutation. The bioinformatics of the corresponding DNA sequences could be classified as a diagram of fractal dimension and Shannon entropy. Results from biomedical gene research provide examples on the diagram methodology. The identification of adaptive selection using entropy marker and functional-structural diversity using fractal dimension would support a regression analysis where the coefficient of determination would serve as evolutionary pathway marker for DNA sequences and be an important component in the astrobiology community. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, clinical trial targeted cancer gene CD47, SIRT6 in spermatogenesis, and HLA-C in mosquito bite immunology demonstrate the diagram classification methodology. Comparisons to the SEPT4-XIAP pair in stem cell apoptosis, testesexpressed taste genes TAS1R3-GNAT3 pair, and amyloid beta APLP1-APLP2 pair with the yeast-mammal DNA sequences for meiotic proteins RAD50-MRE11 pair and NCAPD2-ICK pair have accounted for the observed fluctuating evolutionary pressure systematically. Regression with high R-sq values or a triangular-like cluster pattern for concordant pairs in co-variation among the studied species could serve as evidences for the possible location of common ancestors in the entropy-fractal dimension diagram, consistent with an example of the human-chimp common ancestor study using the FOXP2 regulated genes reported in human fetal brain study. The Deinococcus radiodurans R1 Rad-A could be viewed as an outlier in the RAD50 diagram and also in the free energy versus fractal dimension regression Cook's distance, consistent with a non-Earth source for this radiation resistant bacterium. Convergent and divergent fluctuating evolutionary pressure could be studied with extension to genetic sequences in organisms in possible astrobiology conditions, with the assumption that the continuation of a book of life would require meiotic proteins everywhere in the universe.

  12. Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA

    PubMed Central

    Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev

    2012-01-01

    B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350

  13. Ancient dna from pleistocene fossils: Preservation, recovery, and utility of ancient genetic information for quaternary research

    NASA Astrophysics Data System (ADS)

    Yang, Hong

    Until recently, recovery and analysis of genetic information encoded in ancient DNA sequences from Pleistocene fossils were impossible. Recent advances in molecular biology offered technical tools to obtain ancient DNA sequences from well-preserved Quaternary fossils and opened the possibilities to directly study genetic changes in fossil species to address various biological and paleontological questions. Ancient DNA studies involving Pleistocene fossil material and ancient DNA degradation and preservation in Quaternary deposits are reviewed. The molecular technology applied to isolate, amplify, and sequence ancient DNA is also presented. Authentication of ancient DNA sequences and technical problems associated with modern and ancient DNA contamination are discussed. As illustrated in recent studies on ancient DNA from proboscideans, it is apparent that fossil DNA sequence data can shed light on many aspects of Quaternary research such as systematics and phylogeny. conservation biology, evolutionary theory, molecular taphonomy, and forensic sciences. Improvement of molecular techniques and a better understanding of DNA degradation during fossilization are likely to build on current strengths and to overcome existing problems, making fossil DNA data a unique source of information for Quaternary scientists.

  14. Enantiospecific recognition of DNA sequences by a proflavine Tröger base.

    PubMed

    Bailly, C; Laine, W; Demeunynck, M; Lhomme, J

    2000-07-05

    The DNA interaction of a chiral Tröger base derived from proflavine was investigated by DNA melting temperature measurements and complementary biochemical assays. DNase I footprinting experiments demonstrate that the binding of the proflavine-based Tröger base is both enantio- and sequence-specific. The (+)-isomer poorly interacts with DNA in a non-sequence-selective fashion. In sharp contrast, the corresponding (-)-isomer recognizes preferentially certain DNA sequences containing both A. T and G. C base pairs, such as the motifs 5'-GTT. AAC and 5'-ATGA. TCAT. This is the first experimental demonstration that acridine-type Tröger bases can be used for enantiospecific recognition of DNA sequences. Copyright 2000 Academic Press.

  15. Sensitive detection of mercury and copper ions by fluorescent DNA/Ag nanoclusters in guanine-rich DNA hybridization

    NASA Astrophysics Data System (ADS)

    Peng, Jun; Ling, Jian; Zhang, Xiu-Qing; Bai, Hui-Ping; Zheng, Liyan; Cao, Qiu-E.; Ding, Zhong-Tao

    2015-02-01

    In this work, we designed a new fluorescent oligonucleotides-stabilized silver nanoclusters (DNA/AgNCs) probe for sensitive detection of mercury and copper ions. This probe contains two tailored DNA sequence. One is a signal probe contains a cytosine-rich sequence template for AgNCs synthesis and link sequence at both ends. The other is a guanine-rich sequence for signal enhancement and link sequence complementary to the link sequence of the signal probe. After hybridization, the fluorescence of hybridized double-strand DNA/AgNCs is 200-fold enhanced based on the fluorescence enhancement effect of DNA/AgNCs in proximity of guanine-rich DNA sequence. The double-strand DNA/AgNCs probe is brighter and stable than that of single-strand DNA/AgNCs, and more importantly, can be used as novel fluorescent probes for detecting mercury and copper ions. Mercury and copper ions in the range of 6.0-160.0 and 6-240 nM, can be linearly detected with the detection limits of 2.1 and 3.4 nM, respectively. Our results indicated that the analytical parameters of the method for mercury and copper ions detection are much better than which using a single-strand DNA/AgNCs.

  16. Ab initio DNA synthesis by Bst polymerase in the presence of nicking endonucleases Nt.AlwI, Nb.BbvCI, and Nb.BsmI.

    PubMed

    Antipova, Valeriya N; Zheleznaya, Lyudmila A; Zyrina, Nadezhda V

    2014-08-01

    In the absence of added DNA, thermophilic DNA polymerases synthesize double-stranded DNA from free dNTPs, which consist of numerous repetitive units (ab initio DNA synthesis). The addition of thermophilic restriction endonuclease (REase), or nicking endonuclease (NEase), effectively stimulates ab initio DNA synthesis and determines the nucleotide sequence of reaction products. We have found that NEases Nt.AlwI, Nb.BbvCI, and Nb.BsmI with non-palindromic recognition sites stimulate the synthesis of sequences organized mainly as palindromes. Moreover, the nucleotide sequence of the palindromes appeared to be dependent on NEase recognition/cleavage modes. Thus, the heterodimeric Nb.BbvCI stimulated the synthesis of palindromes composed of two recognition sites of this NEase, which were separated by AT-reach sequences or (A)n (T)m spacers. Palindromic DNA sequences obtained in the ab initio DNA synthesis with the monomeric NEases Nb.BsmI and Nt.AlwI contained, along with the sites of these NEases, randomly synthesized sequences consisted of blocks of short repeats. These findings could help investigation of the potential abilities of highly productive ab initio DNA synthesis for the creation of DNA molecules with desirable sequence. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  17. Mitochondrial genome of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa): A linear DNA molecule encoding a putative DNA-dependent DNA polymerase.

    PubMed

    Shao, Zhiyong; Graf, Shannon; Chaga, Oleg Y; Lavrov, Dennis V

    2006-10-15

    The 16,937-nuceotide sequence of the linear mitochondrial DNA (mt-DNA) molecule of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa) - the first mtDNA sequence from the class Scypozoa and the first sequence of a linear mtDNA from Metazoa - has been determined. This sequence contains genes for 13 energy pathway proteins, small and large subunit rRNAs, and methionine and tryptophan tRNAs. In addition, two open reading frames of 324 and 969 base pairs in length have been found. The deduced amino-acid sequence of one of them, ORF969, displays extensive sequence similarity with the polymerase [but not the exonuclease] domain of family B DNA polymerases, and this ORF has been tentatively identified as dnab. This is the first report of dnab in animal mtDNA. The genes in A. aurita mtDNA are arranged in two clusters with opposite transcriptional polarities; transcription proceeding toward the ends of the molecule. The determined sequences at the ends of the molecule are nearly identical but inverted and lack any obvious potential secondary structures or telomere-like repeat elements. The acquisition of mitochondrial genomic data for the second class of Cnidaria allows us to reconstruct characteristic features of mitochondrial evolution in this animal phylum.

  18. Recent patents of nanopore DNA sequencing technology: progress and challenges.

    PubMed

    Zhou, Jianfeng; Xu, Bingqian

    2010-11-01

    DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.

  19. Small tandemly repeated DNA sequences of higher plants likely originate from a tRNA gene ancestor.

    PubMed Central

    Benslimane, A A; Dron, M; Hartmann, C; Rode, A

    1986-01-01

    Several monomers (177 bp) of a tandemly arranged repetitive nuclear DNA sequence of Brassica oleracea have been cloned and sequenced. They share up to 95% homology between one another and up to 80% with other satellite DNA sequences of Cruciferae, suggesting a common ancestor. Both strands of these monomers show more than 50% homology with many tRNA genes; the best homologies have been obtained with Lys and His yeast mitochondrial tRNA genes (respectively 64% and 60%). These results suggest that small tandemly repeated DNA sequences of plants may have evolved from a tRNA gene ancestor. These tandem repeats have probably arisen via a process involving reverse transcription of polymerase III RNA intermediates, as is the case for interspersed DNA sequences of mammalians. A model is proposed to explain the formation of such small tandemly repeated DNA sequences. Images PMID:3774553

  20. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.

  1. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979

  2. Elman RNN based classification of proteins sequences on account of their mutual information.

    PubMed

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  4. Regulatory link between DNA methylation and active demethylation in Arabidopsis

    PubMed Central

    Lei, Mingguang; Zhang, Huiming; Julian, Russell; Tang, Kai; Xie, Shaojun; Zhu, Jian-Kang

    2015-01-01

    De novo DNA methylation through the RNA-directed DNA methylation (RdDM) pathway and active DNA demethylation play important roles in controlling genome-wide DNA methylation patterns in plants. Little is known about how cells manage the balance between DNA methylation and active demethylation activities. Here, we report the identification of a unique RdDM target sequence, where DNA methylation is required for maintaining proper active DNA demethylation of the Arabidopsis genome. In a genetic screen for cellular antisilencing factors, we isolated several REPRESSOR OF SILENCING 1 (ros1) mutant alleles, as well as many RdDM mutants, which showed drastically reduced ROS1 gene expression and, consequently, transcriptional silencing of two reporter genes. A helitron transposon element (TE) in the ROS1 gene promoter negatively controls ROS1 expression, whereas DNA methylation of an RdDM target sequence between ROS1 5′ UTR and the promoter TE region antagonizes this helitron TE in regulating ROS1 expression. This RdDM target sequence is also targeted by ROS1, and defective DNA demethylation in loss-of-function ros1 mutant alleles causes DNA hypermethylation of this sequence and concomitantly causes increased ROS1 expression. Our results suggest that this sequence in the ROS1 promoter region serves as a DNA methylation monitoring sequence (MEMS) that senses DNA methylation and active DNA demethylation activities. Therefore, the ROS1 promoter functions like a thermostat (i.e., methylstat) to sense DNA methylation levels and regulates DNA methylation by controlling ROS1 expression. PMID:25733903

  5. Phylogeny of Anophelinae using mitochondrial protein coding genes

    PubMed Central

    de Oliveira, Tatiane Marques Porangaba; Bergo, Eduardo S.; Conn, Jan E.; Sant’Ana, Denise Cristina; Nagaki, Sandra Sayuri; Nihei, Silvio; Lamas, Carlos Einicker; González, Christian; Moreira, Caio Cesar; Sallum, Maria Anice Mureb

    2017-01-01

    Malaria is a vector-borne disease that is a great burden on the poorest and most marginalized communities of the tropical and subtropical world. Approximately 41 species of Anopheline mosquitoes can effectively spread species of Plasmodium parasites that cause human malaria. Proposing a natural classification for the subfamily Anophelinae has been a continuous effort, addressed using both morphology and DNA sequence data. The monophyly of the genus Anopheles, and phylogenetic placement of the genus Bironella, subgenera Kerteszia, Lophopodomyia and Stethomyia within the subfamily Anophelinae, remain in question. To understand the classification of Anophelinae, we inferred the phylogeny of all three genera (Anopheles, Bironella, Chagasia) and major subgenera by analysing the amino acid sequences of the 13 protein coding genes of 150 newly sequenced mitochondrial genomes of Anophelinae and 18 newly sequenced Culex species as outgroup taxa, supplemented with 23 mitogenomes from GenBank. Our analyses generally place genus Bironella within the genus Anopheles, which implies that the latter as it is currently defined is not monophyletic. With some inconsistencies, Bironella was placed within the major clade that includes Anopheles, Cellia, Kerteszia, Lophopodomyia, Nyssorhynchus and Stethomyia, which were found to be monophyletic groups within Anophelinae. Our findings provided robust evidence for elevating the monophyletic groupings Kerteszia, Lophopodomyia, Nyssorhynchus and Stethomyia to genus level; genus Anopheles to include subgenera Anopheles, Baimaia, Cellia and Christya; Anopheles parvus to be placed into a new genus; Nyssorhynchus to be elevated to genus level; the genus Nyssorhynchus to include subgenera Myzorhynchella and Nyssorhynchus; Anopheles atacamensis and Anopheles pictipennis to be transferred from subgenus Nyssorhynchus to subgenus Myzorhynchella; and subgenus Nyssorhynchus to encompass the remaining species of Argyritarsis and Albimanus Sections. PMID:29291068

  6. Attomole-level Genomics with Single-molecule Direct DNA, cDNA and RNA Sequencing Technologies.

    PubMed

    Ozsolak, Fatih

    2016-01-01

    With the introduction of next-generation sequencing (NGS) technologies in 2005, the domination of microarrays in genomics quickly came to an end due to NGS's superior technical performance and cost advantages. By enabling genetic analysis capabilities that were not possible previously, NGS technologies have started to play an integral role in all areas of biomedical research. This chapter outlines the low-quantity DNA and cDNA sequencing capabilities and applications developed with the Helicos single molecule DNA sequencing technology.

  7. A cDNA from a mouse pancreatic beta cell encoding a putative transcription factor of the insulin gene.

    PubMed Central

    Walker, M D; Park, C W; Rosen, A; Aronheim, A

    1990-01-01

    Cell specific expression of the insulin gene is achieved through transcriptional mechanisms operating on multiple DNA sequence elements located in the 5' flanking region of the gene. Of particular importance in the rat insulin I gene are two closely similar 9 bp sequences (IEB1 and IEB2): mutation of either of these leads to 5-10 fold reduction in transcriptional activity. We have screened an expression cDNA library derived from mouse pancreatic endocrine beta cells with a radioactive DNA probe containing multiple copies of the IEB1 sequence. A cDNA clone (A1) isolated by this procedure encodes a protein which shows efficient binding to the IEB1 probe, but much weaker binding to either an unrelated DNA probe or to a probe bearing a single base pair insertion within the recognition sequence. DNA sequence analysis indicates a protein belonging to the helix-loop-helix family of DNA-binding proteins. The ability of the protein encoded by clone A1 to recognize a number of wild type and mutant DNA sequences correlates closely with the ability of each sequence element to support transcription in vivo in the context of the insulin 5' flanking DNA. We conclude that the isolated cDNA may encode a transcription factor that participates in control of insulin gene expression. Images PMID:2181401

  8. Highly multiplexed targeted DNA sequencing from single nuclei.

    PubMed

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  9. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less

  10. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...

    2017-07-18

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less

  11. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    PubMed Central

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883

  12. Mapping the binding site of aflatoxin B/sub 1/ in DNA: systematic analysis of the reactivity of aflatoxin B/sub 1/ with guanines in different DNA sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benasutti, M.; Ejadi, S.; Whitlow, M.D.

    The mutagenic and carcinogenic chemical aflatoxin B/sub 1/ (AFB/sub 1/) reacts almost exclusively at the N(7)-position of guanine following activation to its reactive form, the 8,9-epoxide (AFB/sub 1/ oxide). In general N(7)-guanine adducts yield DNA strand breaks when heated in base, a property that serves as the basis for the Maxam-Gilbert DNA sequencing reaction specific for guanine. Using DNA sequencing methods, other workers have shown that AFB/sub 1/ oxide gives strand breaks at positions of guanines; however, the guanine bands varied in intensity. This phenomenon has been used to infer that AFB/sub 1/ oxide prefers to react with guanines inmore » some sequence contexts more than in others and has been referred to as sequence specificity of binding. Herein, data on the reaction of AFB/sub 1/ oxide with several synthetic DNA polymers with different sequences are presented, and (following hydrolysis) adduct levels are determine by high-pressure liquid chromatography. These results reveal that for AFB/sub 1/ oxide (1) the N(7)-guanine adduct is the major adduct found in all of the DNA polymers, (2) adduct levels vary in different sequences, and, thus, sequence specificity is also observed by this more direct method, and (3) the intensity of bands in DNA sequencing gels is likely to reflect adduct levels formed at the N(7)-position of guanine. Knowing this, a reinvestigation of the reactivity of guanines in different DNA sequences using DNA sequencing methods was undertaken. Methods are developed to determine the X (5'-side) base and the Y (3'-side) base are most influential in determining guanine reactivity. These rules in conjunction with molecular modeling studies were used to assess the binding sites that might be utilized by AFB/sub 1/ oxide in its reaction with DNA.« less

  13. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  14. Streptococcus bovimastitidis sp. nov., isolated from a dairy cow with mastitis.

    PubMed

    de Vries, Stefan P W; Hadjirin, Nazreen F; Lay, Elizabeth M; Zadoks, Ruth N; Peacock, Sharon J; Parkhill, Julian; Grant, Andrew J; McDougall, Scott; Holmes, Mark A

    2018-01-01

    Here we describe a new species of the genus Streptococcus that was isolated from a dairy cow with mastitis in New Zealand. Strain NZ1587 T was Gram-positive, coccus-shaped and arranged as chains, catalase and coagulase negative, γ-haemolytic and negative for Lancefield carbohydrates (A-D, F and G). The 16S rRNA sequence did not match sequences in the NCBI 16S rRNA or GreenGenes databases. Taxonomic classification of strain NZ1587 T was investigated using 16S rRNA and core genome phylogeny, genome-wide average nucleotide identity (ANI) and predicted DNA-DNA hybridisation (DDH) analyses. Phylogeny based on 16S rRNA was unable to resolve the taxonomic position of strain NZ1587 T , however NZ1587 T shared 99.4 % identity at the 16S rRNA level with a distinct branch of S. pseudoporcinus. Importantly, core genome phylogeny demonstrated that NZ1587 T grouped amongst the 'pyogenic' streptococcal species and formed a distinct branch supported by a 100 % bootstrap value. In addition, average nucleotide identity and inferred DNA-DNA hybridisation analyses showed that NZ1587 T represents a novel species. Biochemical profiling using the rapid ID 32 strep identification test enabled differentiation of strain NZ1587 T from closely related streptococcal species. In conclusion, strain NZ1587 T can be classified as a novel species, and we propose a novel taxon named Streptococcus bovimastitidis sp. nov.; the type strain is NZ1587 T . NZ1587 T has been deposited in the Culture Collection University of Gothenburg (CCUG 69277 T ) and the Belgian Co-ordinated Collections of Micro-organisms/LMG (LMG 29747).

  15. Identification of novel and known oocyte-specific genes using complementary DNA subtraction and microarray analysis in three different species.

    PubMed

    Vallée, Maud; Gravel, Catherine; Palin, Marie-France; Reghenas, Hélène; Stothard, Paul; Wishart, David S; Sirard, Marc-André

    2005-07-01

    The main objective of the present study was to identify novel oocyte-specific genes in three different species: bovine, mouse, and Xenopus laevis. To achieve this goal, two powerful technologies were combined: a polymerase chain reaction (PCR)-based cDNA subtraction, and cDNA microarrays. Three subtractive libraries consisting of 3456 clones were established and enriched for oocyte-specific transcripts. Sequencing analysis of the positive insert-containing clones resulted in the following classification: 53% of the clones corresponded to known cDNAs, 26% were classified as uncharacterized cDNAs, and a final 9% were classified as novel sequences. All these clones were used for cDNA microarray preparation. Results from these microarray analyses revealed that in addition to already known oocyte-specific genes, such as GDF9, BMP15, and ZP, known genes with unknown function in the oocyte were identified, such as a MLF1-interacting protein (MLF1IP), B-cell translocation gene 4 (BTG4), and phosphotyrosine-binding protein (xPTB). Furthermore, 15 novel oocyte-specific genes were validated by reverse transcription-PCR to confirm their preferential expression in the oocyte compared to somatic tissues. The results obtained in the present study confirmed that microarray analysis is a robust technique to identify true positives from the suppressive subtractive hybridization experiment. Furthermore, obtaining oocyte-specific genes from three species simultaneously allowed us to look at important genes that are conserved across species. Further characterization of these novel oocyte-specific genes will lead to a better understanding of the molecular mechanisms related to the unique functions found in the oocyte.

  16. Cryptic Diversity within the Major Trypanosomiasis Vector Glossina fuscipes Revealed by Molecular Markers

    PubMed Central

    Choi, Kwang-Shik; Darby, Alistair C.; Causse, Sandrine; Kapitano, Berisha; Hall, Martin J. R.; Steen, Keith; Lutumba, Pascal; Madinga, Joules; Torr, Steve J.; Okedi, Loyce M.; Lehane, Michael J.; Donnelly, Martin J.

    2011-01-01

    Background The tsetse fly Glossina fuscipes s.l. is responsible for the transmission of approximately 90% of cases of human African trypanosomiasis (HAT) or sleeping sickness. Three G. fuscipes subspecies have been described, primarily based upon subtle differences in the morphology of their genitalia. Here we describe a study conducted across the range of this important vector to determine whether molecular evidence generated from nuclear DNA (microsatellites and gene sequence information), mitochondrial DNA and symbiont DNA support the existence of these taxa as discrete taxonomic units. Principal Findings The nuclear ribosomal Internal transcribed spacer 1 (ITS1) provided support for the three subspecies. However nuclear and mitochondrial sequence data did not support the monophyly of the morphological subspecies G. f. fuscipes or G. f. quanzensis. Instead, the most strongly supported monophyletic group was comprised of flies sampled from Ethiopia. Maternally inherited loci (mtDNA and symbiont) also suggested monophyly of a group from Lake Victoria basin and Tanzania, but this group was not supported by nuclear loci, suggesting different histories of these markers. Microsatellite data confirmed strong structuring across the range of G. fuscipes s.l., and was useful for deriving the interrelationship of closely related populations. Conclusion/Significance We propose that the morphological classification alone is not used to classify populations of G. fuscipes for control purposes. The Ethiopian population, which is scheduled to be the target of a sterile insect release (SIT) programme, was notably discrete. From a programmatic perspective this may be both positive, given that it may reflect limited migration into the area or negative if the high levels of differentiation are also reflected in reproductive isolation between this population and the flies to be used in the release programme. PMID:21858237

  17. Affordable Hands-On DNA Sequencing and Genotyping: An Exercise for Teaching DNA Analysis to Undergraduates

    ERIC Educational Resources Information Center

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C…

  18. DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

    PubMed Central

    Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  19. Predicting Flavonoid UGT Regioselectivity

    PubMed Central

    Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

    2011-01-01

    Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849

  20. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

    PubMed

    Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

    2015-06-17

    High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.

  1. Analysis of DNA Sequences by An Optical Time-Integrating Correlator: Proof-Of-Concept Experiments.

    DTIC Science & Technology

    1992-05-01

    TABLES xv LIST OF ABBREVIATIONS xvii 1.0 INTRODUCTION 1 2.0 DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0...Zehnder architecture. 3 Figure 3: Short representations of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5... DNA bases where each base is represented by 7-bits long pseudorandom sequences. 4 Table 2: Long representations of the DNA bases with 255-bits maximum

  2. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  3. A simple procedure for parallel sequence analysis of both strands of 5'-labeled DNA.

    PubMed

    Razvi, F; Gargiulo, G; Worcel, A

    1983-08-01

    Ligation of a 5'-labeled DNA restriction fragment results in a circular DNA molecule carrying the two 32Ps at the reformed restriction site. Double digestions of the circular DNA with the original enzyme and a second restriction enzyme cleavage near the labeled site allows direct chemical sequencing of one 5'-labeled DNA strand. Similar double digestions, using an isoschizomer that cleaves differently at the 32P-labeled site, allows direct sequencing of the now 3'-labeled complementary DNA strand. It is possible to directly sequence both strands of cloned DNA inserts by using the above protocol and a multiple cloning site vector that provides the necessary restriction sites. The simultaneous and parallel visualization of both DNA strands eliminates sequence ambiguities. In addition, the labeled circular molecules are particularly useful for single-hit DNA cleavage studies and DNA footprint analysis. As an example, we show here an analysis of the micrococcal nuclease-induced breaks on the two strands of the somatic 5S RNA gene of Xenopus borealis, which suggests that the enzyme may recognize and cleave small AT-containing palindromes along the DNA helix.

  4. A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)

    PubMed Central

    Utsunomia, Ricardo; Ruiz-Ruano, Francisco J.; Silva, Duílio M. Z. A.; Serrano, Érica A.; Rosa, Ivana F.; Scudeler, Patrícia E. S.; Hashimoto, Diogo T.; Oliveira, Claudio; Camacho, Juan Pedro M.; Foresti, Fausto

    2017-01-01

    Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants. PMID:28855916

  5. Short, interspersed, and repetitive DNA sequences in Spiroplasma species.

    PubMed

    Nur, I; LeBlanc, D J; Tully, J G

    1987-03-01

    Small fragments of DNA from an 8-kbp plasmid, pRA1, from a plant pathogenic strain of Spiroplasma citri were shown previously to be present in the chromosomal DNA of at least two species of Spiroplasma. We describe here the shot-gun cloning of chromosomal DNA from S. citri Maroc and the identification of two distinct sequences exhibiting homology to pRA1. Further subcloning experiments provided specific molecular probes for the identification of these two sequences in chromosomal DNA from three distinct plant pathogenic species of Spiroplasma. The results of Southern blot hybridization indicated that each of the pRA1-associated sequences is present as multiple copies in short, dispersed, and repetitive sequences in the chromosomes of these three strains. None of the sequences was detectable in chromosomal DNA from an additional nine Spiroplasma strains examined.

  6. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  7. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  8. Constructing DNA Barcode Sets Based on Particle Swarm Optimization.

    PubMed

    Wang, Bin; Zheng, Xuedong; Zhou, Shihua; Zhou, Changjun; Wei, Xiaopeng; Zhang, Qiang; Wei, Ziqi

    2018-01-01

    Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.

  9. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    PubMed

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  10. Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries

    PubMed Central

    Carpenter, Meredith L.; Buenrostro, Jason D.; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E.; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M. Thomas P.; Willerslev, Eske; Greenleaf, William J.; Bustamante, Carlos D.

    2013-01-01

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. PMID:24568772

  11. Biological nanopore MspA for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Manrao, Elizabeth A.

    Unlocking the information hidden in the human genome provides insight into the inner workings of complex biological systems and can be used to greatly improve health-care. In order to allow for widespread sequencing, new technologies are required that provide fast and inexpensive readings of DNA. Nanopore sequencing is a third generation DNA sequencing technology that is currently being developed to fulfill this need. In nanopore sequencing, a voltage is applied across a small pore in an electrolyte solution and the resulting ionic current is recorded. When DNA passes through the channel, the ionic current is partially blocked. If the DNA bases uniquely modulate the ionic current flowing through the channel, the time trace of the current can be related to the sequence of DNA passing through the pore. There are two main challenges to realizing nanopore sequencing: identifying a pore with sensitivity to single nucleotides and controlling the translocation of DNA through the pore so that the small single nucleotide current signatures are distinguishable from background noise. In this dissertation, I explore the use of Mycobacterium smegmatis porin A (MspA) for nanopore sequencing. In order to determine MspA's sensitivity to single nucleotides, DNA strands of various compositions are held in the pore as the resulting ionic current is measured. DNA is immobilized in MspA by attaching it to a large molecule which acts as an anchor. This technique confirms the single nucleotide resolution of the pore and additionally shows that MspA is sensitive to epigenetic modifications and single nucleotide polymorphisms. The forces from the electric field within MspA, the effective charge of nucleotides, and elasticity of DNA are estimated using a Freely Jointed Chain model of single stranded DNA. These results offer insight into the interactions of DNA within the pore. With the nucleotide sensitivity of MspA confirmed, a method is introduced to controllably pass DNA through the pore. Using a DNA polymerase, DNA strands are stepped through MspA one nucleotide at a time. The steps are observable as distinct levels on the ionic-current time-trace and are related to the DNA sequence. These experiments overcome the two fundamental challenges to realizing MspA nanopore sequencing and pave the way to the development of a commercial technology.

  12. Effects of sequence on DNA wrapping around histones

    NASA Astrophysics Data System (ADS)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  13. A high-throughput and quantitative method to assess the mutagenic potential of translesion DNA synthesis

    PubMed Central

    Taggart, David J.; Camerlengo, Terry L.; Harrison, Jason K.; Sherrer, Shanen M.; Kshetry, Ajay K.; Taylor, John-Stephen; Huang, Kun; Suo, Zucai

    2013-01-01

    Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases. PMID:23470999

  14. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

    PubMed

    Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

    2010-03-30

    The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.

  15. Changing techniques in crop plant classification: molecularization at the National Institute of Agricultural Botany during the 1980s.

    PubMed

    Holmes, Matthew

    2017-04-01

    Modern methods of analysing biological materials, including protein and DNA sequencing, are increasingly the objects of historical study. Yet twentieth-century taxonomic techniques have been overlooked in one of their most important contexts: agricultural botany. This paper addresses this omission by harnessing unexamined archival material from the National Institute of Agricultural Botany (NIAB), a British plant science organization. During the 1980s the NIAB carried out three overlapping research programmes in crop identification and analysis: electrophoresis, near infrared spectroscopy (NIRS) and machine vision systems. For each of these three programmes, contemporary economic, statutory and scientific factors behind their uptake by the NIAB are discussed. This approach reveals significant links between taxonomic practice at the NIAB and historical questions around agricultural research, intellectual property and scientific values. Such links are of further importance given that the techniques developed by researchers at the NIAB during the 1980s remain part of crop classification guidelines issued by international bodies today.

  16. Natural Variation of Epstein-Barr Virus Genes, Proteins, and Primary MicroRNA.

    PubMed

    Correia, Samantha; Palser, Anne; Elgueta Karstegl, Claudio; Middeldorp, Jaap M; Ramayanti, Octavia; Cohen, Jeffrey I; Hildesheim, Allan; Fellner, Maria Dolores; Wiels, Joelle; White, Robert E; Kellam, Paul; Farrell, Paul J

    2017-08-01

    Viral gene sequences from an enlarged set of about 200 Epstein-Barr virus (EBV) strains, including many primary isolates, have been used to investigate variation in key viral genetic regions, particularly LMP1, Zp, gp350, EBNA1, and the BART microRNA (miRNA) cluster 2. Determination of type 1 and type 2 EBV in saliva samples from people from a wide range of geographic and ethnic backgrounds demonstrates a small percentage of healthy white Caucasian British people carrying predominantly type 2 EBV. Linkage of Zp and gp350 variants to type 2 EBV is likely to be due to their genes being adjacent to the EBNA3 locus, which is one of the major determinants of the type 1/type 2 distinction. A novel classification of EBNA1 DNA binding domains, named QCIGP, results from phylogeny analysis of their protein sequences but is not linked to the type 1/type 2 classification. The BART cluster 2 miRNA region is classified into three major variants through single-nucleotide polymorphisms (SNPs) in the primary miRNA outside the mature miRNA sequences. These SNPs can result in altered levels of expression of some miRNAs from the BART variant frequently present in Chinese and Indonesian nasopharyngeal carcinoma (NPC) samples. The EBV genetic variants identified here provide a basis for future, more directed analysis of association of specific EBV variations with EBV biology and EBV-associated diseases. IMPORTANCE Incidence of diseases associated with EBV varies greatly in different parts of the world. Thus, relationships between EBV genome sequence variation and health, disease, geography, and ethnicity of the host may be important for understanding the role of EBV in diseases and for development of an effective EBV vaccine. This paper provides the most comprehensive analysis so far of variation in specific EBV genes relevant to these diseases and proposed EBV vaccines. By focusing on variation in LMP1, Zp, gp350, EBNA1, and the BART miRNA cluster 2, new relationships with the known type 1/type 2 strains are demonstrated, and a novel classification of EBNA1 and the BART miRNAs is proposed. Copyright © 2017 Correia et al.

  17. Molecular Characterization and Comparative Sequence Analysis of Defense-Related Gene, Oryza rufipogon Receptor-Like Protein Kinase 1

    PubMed Central

    Law, Yee-Song; Gudimella, Ranganath; Song, Beng-Kah; Ratnam, Wickneswari; Harikrishna, Jennifer Ann

    2012-01-01

    Many of the plant leucine rich repeat receptor-like kinases (LRR-RLKs) have been found to regulate signaling during plant defense processes. In this study, we selected and sequenced an LRR-RLK gene, designated as Oryza rufipogon receptor-like protein kinase 1 (OrufRPK1), located within yield QTL yld1.1 from the wild rice Oryza rufipogon (accession IRGC105491). A 2055 bp coding region and two exons were identified. Southern blotting determined OrufRPK1 to be a single copy gene. Sequence comparison with cultivated rice orthologs (OsI219RPK1, OsI9311RPK1 and OsJNipponRPK1, respectively derived from O. sativa ssp. indica cv. MR219, O. sativa ssp. indica cv. 9311 and O. sativa ssp. japonica cv. Nipponbare) revealed the presence of 12 single nucleotide polymorphisms (SNPs) with five non-synonymous substitutions, and 23 insertion/deletion sites. The biological role of the OrufRPK1 as a defense related LRR-RLK is proposed on the basis of cDNA sequence characterization, domain subfamily classification, structural prediction of extra cellular domains, cluster analysis and comparative gene expression. PMID:22942769

  18. ReadXplorer—visualization and analysis of mapped sequences

    PubMed Central

    Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

    2014-01-01

    Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157

  19. An extended sequence specificity for UV-induced DNA damage.

    PubMed

    Chung, Long H; Murray, Vincent

    2018-01-01

    The sequence specificity of UV-induced DNA damage was determined with a higher precision and accuracy than previously reported. UV light induces two major damage adducts: cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). Employing capillary electrophoresis with laser-induced fluorescence and taking advantages of the distinct properties of the CPDs and 6-4PPs, we studied the sequence specificity of UV-induced DNA damage in a purified DNA sequence using two approaches: end-labelling and a polymerase stop/linear amplification assay. A mitochondrial DNA sequence that contained a random nucleotide composition was employed as the target DNA sequence. With previous methodology, the UV sequence specificity was determined at a dinucleotide or trinucleotide level; however, in this paper, we have extended the UV sequence specificity to a hexanucleotide level. With the end-labelling technique (for 6-4PPs), the consensus sequence was found to be 5'-GCTC*AC (where C* is the breakage site); while with the linear amplification procedure, it was 5'-TCTT*AC. With end-labelling, the dinucleotide frequency of occurrence was highest for 5'-TC*, 5'-TT* and 5'-CC*; whereas it was 5'-TT* for linear amplification. The influence of neighbouring nucleotides on the degree of UV-induced DNA damage was also examined. The core sequences consisted of pyrimidine nucleotides 5'-CTC* and 5'-CTT* while an A at position "1" and C at position "2" enhanced UV-induced DNA damage. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  20. Multiplex analysis of DNA

    DOEpatents

    Church, George M.; Kieffer-Higgins, Stephen

    1992-01-01

    This invention features vectors and a method for sequencing DNA. The method includes the steps of: a) ligating the DNA into a vector comprising a tag sequence, the tag sequence includes at least 15 bases, wherein the tag sequence will not hybridize to the DNA under stringent hybridization conditions and is unique in the vector, to form a hybrid vector, b) treating the hybrid vector in a plurality of vessels to produce fragments comprising the tag sequence, wherein the fragments differ in length and terminate at a fixed known base or bases, wherein the fixed known base or bases differs in each vessel, c) separating the fragments from each vessel according to their size, d) hybridizing the fragments with an oligonucleotide able to hybridize specifically with the tag sequence, and e) detecting the pattern of hybridization of the tag sequence, wherein the pattern reflects the nucleotide sequence of the DNA.

  1. BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing

    PubMed Central

    Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph

    2011-01-01

    Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797

  2. A DNA sequence analysis package for the IBM personal computer.

    PubMed Central

    Lagrimini, L M; Brentano, S T; Donelson, J E

    1984-01-01

    We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433

  3. Systematics of gastrointestinal nematodes of domestic ruminants: advances between 1992 and 1995 and proposals for future research.

    PubMed

    Lichtenfels, J R; Hoberg, E P; Zarlenga, D S

    1997-11-01

    The systematics of trichostrongyloid nematodes of ruminants provides a foundation for diagnostics and responds to the need to identify eggs in feces, free-living larvae from pastures or fecal cultures and larval or adult nematodes collected from hosts. These needs are associated with diagnostic problems or research projects. Difficulties in identifying all developmental stages of trichostrongyloid nematodes of domestic ruminants still severely limit the effective diagnosis and control of these parasites. Phylogenetic hypotheses as the basis for predictive classifications have been developed only for the subfamilies of the Trichostrongylidae. This report briefly describes recent progress in the development of improved tools for identification, phylogenetic analyses and predictive classifications. It also describes future research needed on the identification and classification of trichostrongyloid nematode parasites of domestic ruminants. Nematodes included are species of the super-family Trichostrongyloidea known to be important pathogens of domestic ruminants. The information summarized is presented by nematode developmental stage and by taxonomic groups. Eggs: While eggs of some trichostrongyloid nematode parasites of ruminants can be readily identified to their genus (Nematodirus), and some to species (e.g. Nematodirus battus), most of the important pathogens (including the Ostertagiinae and Haemonchinae) cannot be identified morphologically or morphometrically even to family level. However, DNA technology has been developed for determining not only the presence of specific pathogens in eggs from fecal samples, but also for estimating the percentage of the total eggs that each pathogen comprises. This new method will make possible a rapid determination of which individual animals in a herd should be treated. Larvae: The most commonly-used method for identifying infective larvae is time-consuming (several weeks), unreliable for estimating intensities of individual species as components of mixed populations and requires highly trained specialists. Available identification keys for larvae are not well illustrated and need to be augmented. Adults: Recent advances in the identification of adult trichostrongyloids and their systematics are organized by taxonomic group. Genera included are Ostertagia, Haemonchus, Cooperia, Trichostrongylus and Nematodirus. Recently, the first phylogenetic analysis of the Trichostrongylidae family established monophyly for the family. A similar analysis of the Molineidae is needed. Ostertagia: Several studies of polymorphism summarized the phenomenon and listed 19 polymorphic species in five genera. Two studies of DNA differences within and among polymorphic species of Ostertagiinae supported earlier hypotheses that the species pairs represent polymorphic species. A phylogenetic analysis of the Ostertagiinae and generic concepts are needed. Haemonchus: A key to three species of Haemonchus provides, for the first time, morphological characteristics for the microscopical identification to species of individual adult nematodes of either sex. The Food and Drug Administration is now requiring that results of drug trials include identification of Haemonchus to species. Cooperia: Studies using random amplified polymorphic DNA methods showed a high degree of variation within and among C. oncophora/C. surnabada, but supported a polymorphic relationship for the species pair. A phylogenetic analysis of the Cooperiinae is needed. Trichostrongylus: Restriction Fragment Length Polymorphisms (RFLPs) of genomic DNA of two strains of T. colubriformis indicated a high degree of intra- and inter-strain DNA polymorphism. However, other studies demonstrated expected species level differences between T. colubriformis and T. vitrinus using Random Amplified Polymorphic DNA (RAPD) methods. Sequences of the second Internal Transcribed Spacer Region (ITS-2) ribosomal repeat showed sequence differences of 1.3-7.6% among five

  4. Genomic sequencing of Pleistocene cave bears

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less

  5. Characterization of North American Armillaria species: Genetic relationships determined by ribosomal DNA sequences and AFLP markers

    Treesearch

    M. -S. Kim; N. B. Klopfenstein; J. W. Hanna; G. I. McDonald

    2006-01-01

    Phylogenetic and genetic relationships among 10 North American Armillaria species were analysed using sequence data from ribosomal DNA (rDNA), including intergenic spacer (IGS-1), internal transcribed spacers with associated 5.8S (ITS + 5.8S), and nuclear large subunit rDNA (nLSU), and amplified fragment length polymorphism (AFLP) markers. Based on rDNA sequence data,...

  6. Fractal landscape analysis of DNA walks

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    By mapping nucleotide sequences onto a "DNA walk", we uncovered remarkably long-range power law correlations [Nature 356 (1992) 168] that imply a new scale invariant property of DNA. We found such long-range correlations in intron-containing genes and in non-transcribed regulatory DNA sequences, but not in cDNA sequences or intron-less genes. In this paper, we present more explicit evidences to support our findings.

  7. Intratumoral and Intertumoral Genomic Heterogeneity of Multifocal Localized Prostate Cancer Impacts Molecular Classifications and Genomic Prognosticators

    PubMed Central

    Wei, Lei; Wang, Jianmin; Lampert, Erika; Schlanger, Simon; DePriest, Adam D.; Hu, Qiang; Gomez, Eduardo Cortes; Murakam, Mitsuko; Glenn, Sean T.; Conroy, Jeffrey; Morrison, Carl; Azabdaftari, Gissou; Mohler, James L.; Liu, Song; Heemers, Hannelore V.

    2018-01-01

    Background Next-generation sequencing is revealing genomic heterogeneity in localized prostate cancer (CaP). Incomplete sampling of CaP multiclonality has limited the implications for molecular subtyping, stratification, and systemic treatment. Objective To determine the impact of genomic and transcriptomic diversity within and among intraprostatic CaP foci on CaP molecular taxonomy, predictors of progression, and actionable therapeutic targets. Design, setting, and participants Four consecutive patients with clinically localized National Comprehensive Cancer Network intermediate- or high-risk CaP who did not receive neoadjuvant therapy underwent radical prostatectomy at Roswell Park Cancer Institute in June–July 2014. Presurgical information on CaP content and a customized tissue procurement procedure were used to isolate nonmicroscopic and noncontiguous CaP foci in radical prostatectomy specimens. Three cores were obtained from the index lesion and one core from smaller lesions. RNA and DNA were extracted simultaneously from 26 cores with ≥90% CaP content and analyzed using whole-exome sequencing, single-nucleotide polymorphism arrays, and RNA sequencing. Outcome measurements and statistical analysis Somatic mutations, copy number alternations, gene expression, gene fusions, and phylogeny were defined. The impact of genomic alterations on CaP molecular classification, gene sets measured in Oncotype DX, Prolaris, and Decipher assays, and androgen receptor activity among CaP cores was determined. Results and limitations There was considerable variability in genomic alterations among CaP cores, and between RNA- and DNA-based platforms. Heterogeneity was found in molecular grouping of individual CaP foci and the activity of gene sets underlying the assays for risk stratification and androgen receptor activity, and was validated in independent genomic data sets. Determination of the implications for clinical decision-making requires follow-up studies. Conclusions Genomic make-up varies widely among CaP foci, so care should be taken when making treatment decisions based on a single biopsy or index lesions. Patient summary We examined the molecular composition of individual cancers in a patient’s prostate. We found a lot of genetic diversity among these cancers, and concluded that information from a single cancer biopsy is not sufficient to guide treatment decisions. PMID:27451135

  8. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  9. Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

    PubMed

    Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo

    2016-01-25

    DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.

  10. CpG PatternFinder: a Windows-based utility program for easy and rapid identification of the CpG methylation status of DNA.

    PubMed

    Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C

    2007-09-01

    The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.

  11. DNA-based watermarks using the DNA-Crypt algorithm.

    PubMed

    Heider, Dominik; Barnekow, Angelika

    2007-05-29

    The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

  12. DNA-based watermarks using the DNA-Crypt algorithm

    PubMed Central

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  13. Conserved Sequences at the Origin of Adenovirus DNA Replication

    PubMed Central

    Stillman, Bruce W.; Topp, William C.; Engler, Jeffrey A.

    1982-01-01

    The origin of adenovirus DNA replication lies within an inverted sequence repetition at either end of the linear, double-stranded viral DNA. Initiation of DNA replication is primed by a deoxynucleoside that is covalently linked to a protein, which remains bound to the newly synthesized DNA. We demonstrate that virion-derived DNA-protein complexes from five human adenovirus serological subgroups (A to E) can act as a template for both the initiation and the elongation of DNA replication in vitro, using nuclear extracts from adenovirus type 2 (Ad2)-infected HeLa cells. The heterologous template DNA-protein complexes were not as active as the homologous Ad2 DNA, most probably due to inefficient initiation by Ad2 replication factors. In an attempt to identify common features which may permit this replication, we have also sequenced the inverted terminal repeated DNA from human adenovirus serotypes Ad4 (group E), Ad9 and Ad10 (group D), and Ad31 (group A), and we have compared these to previously determined sequences from Ad2 and Ad5 (group C), Ad7 (group B), and Ad12 and Ad18 (group A) DNA. In all cases, the sequence around the origin of DNA replication can be divided into two structural domains: a proximal A · T-rich region which is partially conserved among these serotypes, and a distal G · C-rich region which is less well conserved. The G · C-rich region contains sequences similar to sequences present in papovavirus replication origins. The two domains may reflect a dual mechanism for initiation of DNA replication: adenovirus-specific protein priming of replication, and subsequent utilization of this primer by host replication factors for completion of DNA synthesis. Images PMID:7143575

  14. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  15. [Screening and identification of endophytic fungi with growth promoting effect on Dendrobium officinale].

    PubMed

    Hou, Xiao-qiang; Guo, Shun-xing

    2014-09-01

    The endophytic fungi with plant growth promoting effects were screened by co-culture of each endophytic fungus and seedlings of Dendrobium officinale. Anatomical features of the inoculated roots were studied by paraffin sectioning. Morphological characteristics and rDNA ITS1-5. 8S-ITS2 sequences were applied for the taxonomy of endophytic fungi. The results showed that 8 strains inoculated to D. officinale seedlings greatly enhanced plant height, stem diameter, new roots number and biomass. According to the anatomical features of the inoculated roots, each fungus could infect the velamina of seedlings. The hyphae or pelotons were existed in the exodermis passage cells and cortex cells. The effective fungi could not infect the endodermis and vascular bundle sheath, but which was exception for other fungi with harmful to seedlings. Combined with classic morphologic classification, 2 effective strains were identified which were subjected to Pestalotiopsis and Eurotium. Six species of fungi without conidiophore belonged to Pyrenochaeta, Coprinellus, Pholiota, Alternaria, Helotiales, which were identified by sequencing the PCR-amplified rDNA ITS1-5. 8S-ITS2 regions. The co-culture technology of effective endophytic fungi and plant can apply to cultivate the seedlings of D. officinale. It is feasible to shorten growth cycle of D. officinale and increase the resource of Chinese herbs.

  16. Phylogeny of Darwin's finches as revealed by mtDNA sequences.

    PubMed

    Sato, A; O'hUigin, C; Figueroa, F; Grant, P R; Grant, B R; Tichy, H; Klein, J

    1999-04-27

    Darwin's finches comprise a group of passerine birds first collected by Charles Darwin during his visit to the Galápagos Archipelago. The group, a textbook example of adaptive radiation (the diversification of a founding population into an array of species differentially adapted to diverse environmental niches), encompasses 14 currently recognized species, of which 13 live on the Galápagos Islands and one on the Cocos Island in the Pacific Ocean. Although Darwin's finches have been studied extensively by morphologists, ecologists, and ethologists, their phylogenetic relationships remain uncertain. Here, sequences of two mtDNA segments, the cytochrome b and the control region, have been used to infer the evolutionary history of the group. The data reveal the Darwin's finches to be a monophyletic group with the warbler finch being the species closest to the founding stock, followed by the vegetarian finch, and then by two sister groups, the ground and the tree finches. The Cocos finch is related to the tree finches of the Galápagos Islands. The traditional classification of ground finches into six species and tree finches into five species is not reflected in the molecular data. In these two groups, ancestral polymorphisms have not, as yet, been sorted out among the cross-hybridizing species.

  17. Are all GMOs the same? Consumer acceptance of cisgenic rice in India.

    PubMed

    Shew, Aaron M; Nalley, Lawton L; Danforth, Diana M; Dixon, Bruce L; Nayga, Rodolfo M; Delwaide, Anne-Cecile; Valent, Barbara

    2016-01-01

    India has more than 215 million food-insecure people, many of whom are farmers. Genetically modified (GM) crops have the potential to alleviate this problem by increasing food supplies and strengthening farmer livelihoods. For this to occur, two factors are critical: (i) a change in the regulatory status of GM crops, and (ii) consumer acceptance of GM foods. There are generally two classifications of GM crops based on how they are bred: cisgenically bred, containing only DNA sequences from sexually compatible organisms; and transgenically bred, including DNA sequences from sexually incompatible organisms. Consumers may view cisgenic foods as more natural than those produced via transgenesis, thus influencing consumer acceptance. This premise was the catalyst for our study--would Indian consumers accept cisgenically bred rice and if so, how would they value cisgenics compared to conventionally bred rice, GM-labelled rice and 'no fungicide' rice? In this willingness-to-pay study, respondents did not view cisgenic and GM rice differently. However, participants were willing-to-pay a premium for any aforementioned rice with a 'no fungicide' attribute, which cisgenics and GM could provide. Although not significantly different (P = 0.16), 76% and 73% of respondents stated a willingness-to-consume GM and cisgenic foods, respectively. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  18. Hardware Acceleration Of Multi-Deme Genetic Algorithm for DNA Codeword Searching

    DTIC Science & Technology

    2008-01-01

    C and G are complementary to each other. A Watson - Crick complement of a DNA sequence is another DNA sequence which replaces all the A with T or vise...versa and replaces all the T with A or vise versa, and also switches the 5’ and 3’ ends. A DNA sequence binds most stably with its Watson - Crick ...bind with 5 Watson - Crick pairs. The length of the longest complementary sequence between two flexible DNA strands, A and B, is the same as the

  19. Combined subtraction hybridization and polymerase chain reaction amplification procedure for isolation of strain-specific Rhizobium DNA sequences.

    PubMed Central

    Bjourson, A J; Stone, C E; Cooper, J E

    1992-01-01

    A novel subtraction hybridization procedure, incorporating a combination of four separation strategies, was developed to isolate unique DNA sequences from a strain of Rhizobium leguminosarum bv. trifolii. Sau3A-digested DNA from this strain, i.e., the probe strain, was ligated to a linker and hybridized in solution with an excess of pooled subtracter DNA from seven other strains of the same biovar which had been restricted, ligated to a different, biotinylated, subtracter-specific linker, and amplified by polymerase chain reaction to incorporate dUTP. Subtracter DNA and subtracter-probe hybrids were removed by phenol-chloroform extraction of a streptavidin-biotin-DNA complex. NENSORB chromatography of the sequences remaining in the aqueous layer captured biotinylated subtracter DNA which may have escaped removal by phenol-chloroform treatment. Any traces of contaminating subtracter DNA were removed by digestion with uracil DNA glycosylase. Finally, remaining sequences were amplified by polymerase chain reaction with a probe strain-specific primer, labelled with 32P, and tested for specificity in dot blot hybridizations against total genomic target DNA from each strain in the subtracter pool. Two rounds of subtraction-amplification were sufficient to remove cross-hybridizing sequences and to give a probe which hybridized only with homologous target DNA. The method is applicable to the isolation of DNA and RNA sequences from both procaryotic and eucaryotic cells. Images PMID:1637166

  20. Sequence Dependent Interactions Between DNA and Single-Walled Carbon Nanotubes

    NASA Astrophysics Data System (ADS)

    Roxbury, Daniel

    It is known that single-stranded DNA adopts a helical wrap around a single-walled carbon nanotube (SWCNT), forming a water-dispersible hybrid molecule. The ability to sort mixtures of SWCNTs based on chirality (electronic species) has recently been demonstrated using special short DNA sequences that recognize certain matching SWCNTs of specific chirality. This thesis investigates the intricacies of DNA-SWCNT sequence-specific interactions through both experimental and molecular simulation studies. The DNA-SWCNT binding strengths were experimentally quantified by studying the kinetics of DNA replacement by a surfactant on the surface of particular SWCNTs. Recognition ability was found to correlate strongly with measured binding strength, e.g. DNA sequence (TAT)4 was found to bind 20 times stronger to the (6,5)-SWCNT than sequence (TAT)4T. Next, using replica exchange molecular dynamics (REMD) simulations, equilibrium structures formed by (a) single-strands and (b) multiple-strands of 12-mer oligonucleotides adsorbed on various SWCNTs were explored. A number of structural motifs were discovered in which the DNA strand wraps around the SWCNT and 'stitches' to itself via hydrogen bonding. Great variability among equilibrium structures was observed and shown to be directly influenced by DNA sequence and SWCNT type. For example, the (6,5)-SWCNT DNA recognition sequence, (TAT)4, was found to wrap in a tight single-stranded right-handed helical conformation. In contrast, DNA sequence T12 forms a beta-barrel left-handed structure on the same SWCNT. These are the first theoretical indications that DNA-based SWCNT selectivity can arise on a molecular level. In a biomedical collaboration with the Mayo Clinic, pathways for DNA-SWCNT internalization into healthy human endothelial cells were explored. Through absorbance spectroscopy, TEM imaging, and confocal fluorescence microscopy, we showed that intracellular concentrations of SWCNTs far exceeded those of the incubation solution, which suggested an energy-dependent pathway. Additionally, by means of pharmacological inhibition and vector-induced gene knockout studies, the DNA-SWCNTs were shown to enter the cells via Rac1-mediated macropinocytosis.

  1. Development of a Novel Technology for Label Free DNA Sequencing

    DTIC Science & Technology

    2012-05-21

    of the C-H bond stretch vibrations in the planes of the corresponding DNA bases , and in the higher-frequency side, sequence-identifier region is...composed of the N-H bond stretch vibrations in the planes of the corresponding DNA bases . In addition, the sequence-identifier dividing region almost...regions are localized at the corresponding DNA bases and exhibit a definable dependence on the sequence form of the codons under study. Final

  2. Phylogenetic diversity and biogeography of the Mamiellophyceae lineage of eukaryotic phytoplankton across the oceans.

    PubMed

    Monier, Adam; Worden, Alexandra Z; Richards, Thomas A

    2016-08-01

    High-throughput diversity amplicon sequencing of marine microbial samples has revealed that members of the Mamiellophyceae lineage are successful phytoplankton in many oceanic habitats. Indeed, these eukaryotic green algae can dominate the picoplanktonic biomass, however, given the broad expanses of the oceans, their geographical distributions and the phylogenetic diversity of some groups remain poorly characterized. As these algae play a foundational role in marine food webs, it is crucial to assess their global distribution in order to better predict potential changes in abundance and community structure. To this end, we analyzed the V9-18S small subunit rDNA sequences deposited from the Tara Oceans expedition to evaluate the diversity and biogeography of these phytoplankton. Our results show that the phylogenetic composition of Mamiellophyceae communities is in part determined by geographical provenance, and do not appear to be influenced - in the samples recovered - by water depth, at least at the resolution possible with the V9-18S. Phylogenetic classification of Mamiellophyceae sequences revealed that the Dolichomastigales order encompasses more sequence diversity than other orders in this lineage. These results indicate that a large fraction of the Mamiellophyceae diversity has been hitherto overlooked, likely because of a combination of size fraction, sequencing and geographical limitations. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

  3. Flow cytometry for enrichment and titration in massively parallel DNA sequencing

    PubMed Central

    Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim

    2009-01-01

    Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748

  4. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer.

    PubMed

    Álvarez-Martos, Isabel; Ferapontova, Elena E

    2017-08-05

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Method for sequencing DNA base pairs

    DOEpatents

    Sessler, Andrew M.; Dawson, John

    1993-01-01

    The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.

  6. Nuclear counterparts of the cytoplasmic mitochondrial 12S rRNA gene: a problem of ancient DNA and molecular phylogenies.

    PubMed

    van der Kuyl, A C; Kuiken, C L; Dekker, J T; Perizonius, W R; Goudsmit, J

    1995-06-01

    Monkey mummy bones and teeth originating from the North Saqqara Baboon Galleries (Egypt), soft tissue from a mummified baboon in a museum collection, and nineteenth/twentieth-century skin fragments from mangabeys were used for DNA extraction and PCR amplification of part of the mitochondrial 12S rRNA gene. Sequences aligning with the 12S rRNA gene were recovered but were only distantly related to contemporary monkey mitochondrial 12S rRNA sequences. However, many of these sequences were identical or closely related to human nuclear DNA sequences resembling mitochondrial 12S rRNA (isolated from a cell line depleted in mitochondria) and therefore have to be considered contamination. Subsequently in a separate study we were able to recover genuine mitochondrial 12S rRNA sequences from many extant species of nonhuman Old World primates and sequences closely resembling the human nuclear integrations. Analysis of all sequences by the neighbor-joining (NJ) method indicated that mitochondrial DNA sequences and their nuclear counterparts can be divided into two distinct clusters. One cluster contained all temporary cytoplasmic mitochondrial DNA sequences and approximately half of the monkey nuclear mitochondriallike sequences. A second cluster contained most human nuclear sequences and the other half of monkey nuclear sequences with a separate branch leading to human and gorilla mitochondrial and nuclear sequences. Sequences recovered from ancient materials were equally divided between the two clusters. These results constitute a warning for when working with ancient DNA or performing phylogenetic analysis using mitochondrial DNA as a target sequence: Nuclear counterparts of mitochondrial genes may lead to faulty interpretation of results.

  7. Redescription of Hepatozoon felis (Apicomplexa: Hepatozoidae) based on phylogenetic analysis, tissue and blood form morphology, and possible transplacental transmission.

    PubMed

    Baneth, Gad; Sheiner, Alina; Eyal, Osnat; Hahn, Shelley; Beaufils, Jean-Pierre; Anug, Yigal; Talmi-Frank, Dalit

    2013-04-15

    A Hepatozoon parasite was initially reported from a cat in India in 1908 and named Leucocytozoon felis domestici. Although domestic feline hepatozoonosis has since been recorded from Europe, Africa, Asia and America, its description, classification and pathogenesis have remained vague and the distinction between different species of Hepatozoon infecting domestic and wild carnivores has been unclear. The aim of this study was to carry out a survey on domestic feline hepatozoonosis and characterize it morphologically and genetically. Hepatozoon sp. DNA was amplified by PCR from the blood of 55 of 152 (36%) surveyed cats in Israel and from all blood samples of an additional 19 cats detected as parasitemic by microscopy during routine hematologic examinations. Hepatozoon sp. forms were also characterized from tissues of naturally infected cats. DNA sequencing determined that all cats were infected with Hepatozoon felis except for two infected by Hepatozoon canis. A significant association (p = 0.00001) was found between outdoor access and H. felis infection. H. felis meronts containing merozoites were characterized morphologically from skeletal muscles, myocardium and lungs of H. felis PCR-positive cat tissues and development from early to mature meront was described. Distinctly-shaped gamonts were observed and measured from the blood of these H. felis infected cats. Two fetuses from H. felis PCR-positive queens were positive by PCR from fetal tissue including the lung and amniotic fluid, suggesting possible transplacental transmission. Genetic analysis indicated that H. felis DNA sequences from Israeli cats clustered together with the H. felis Spain 1 and Spain 2 sequences. These cat H. felis sequences clustered separately from the feline H. canis sequences, which grouped with Israeli and foreign dog H. canis sequences. H. felis clustered distinctly from Hepatozoon spp. of other mammals. Feline hepatozoonosis caused by H. felis is mostly sub-clinical as a high proportion of the population is infected with no apparent overt clinical manifestations. This study aimed to integrate new histopathologic, hematologic, clinical, epidemiological and genetic findings on feline hepatozoonosis and promote the understanding of this infection. The results indicate that feline infection is primarily caused by a morphologically and genetically distinct species, H. felis, which has predilection to infecting muscular tissues, and is highly prevalent in the cat population studied. The lack of previous comprehensively integrated data merits the redescription of this parasite elucidating its parasitological characteristics.

  8. High-throughput sequencing of microbial community diversity in soil, grapes, leaves, grape juice and wine of grapevine from China.

    PubMed

    Wei, Yu-Jie; Wu, Yun; Yan, Yin-Zhuo; Zou, Wan; Xue, Jie; Ma, Wen-Rui; Wang, Wei; Tian, Ge; Wang, Li-Ye

    2018-01-01

    In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine.

  9. The complete genome sequence and genetic analysis of ΦCA82 a novel uncultured microphage from the turkey gastrointestinal system

    PubMed Central

    2011-01-01

    The genomic DNA sequence of a novel enteric uncultured microphage, ΦCA82 from a turkey gastrointestinal system was determined utilizing metagenomics techniques. The entire circular, single-stranded nucleotide sequence of the genome was 5,514 nucleotides. The ΦCA82 genome is quite different from other microviruses as indicated by comparisons of nucleotide similarity, predicted protein similarity, and functional classifications. Only three genes showed significant similarity to microviral proteins as determined by local alignments using BLAST analysis. ORF1 encoded a predicted phage F capsid protein that was phylogenetically most similar to the Microviridae ΦMH2K member's major coat protein. The ΦCA82 genome also encoded a predicted minor capsid protein (ORF2) and putative replication initiation protein (ORF3) most similar to the microviral bacteriophage SpV4. The distant evolutionary relationship of ΦCA82 suggests that the divergence of this novel turkey microvirus from other microviruses may reflect unique evolutionary pressures encountered within the turkey gastrointestinal system. PMID:21714899

  10. Population Abundance of Potentially Pathogenic Organisms in Intestinal Microbiome of Jungle Crow (Corvus macrorhynchos) Shown with 16S rRNA Gene-Based Microbial Community Analysis

    PubMed Central

    Maeda, Isamu; Siddiki, Mohammad Shohel Rana; Nozawa-Takeda, Tsutomu; Tsukahara, Naoki; Tani, Yuri; Naito, Taki; Sugita, Shoei

    2013-01-01

    Jungle Crows (Corvus macrorhynchos) prefer human habitats because of their versatility in feeding accompanied with human food consumption. Therefore, it is important from a public health viewpoint to characterize their intestinal microbiota. However, no studies have been involved in molecular characterization of the microbiota based on huge and reliable number of data acquisition. In this study, 16S rRNA gene-based microbial community analysis coupled with the next-generation DNA sequencing techniques was applied to the taxonomic classification of intestinal microbiome for three jungle crows. Clustering of the reads into 130 operational taxonomic units showed that at least 70% of analyzed sequences for each crow were highly homologous to Eimeria sp., which belongs to the protozoan phylum Apicomplexa. The microbiotas of three crows also contained potentially pathogenic bacteria with significant percentages, such as the genera Campylobacter and Brachyspira. Thus, the profiling of a large number of 16S rRNA gene sequences in crow intestinal microbiomes revealed the high-frequency existence or vestige of potentially pathogenic microorganisms. PMID:24058905

  11. Population abundance of potentially pathogenic organisms in intestinal microbiome of jungle crow (Corvus macrorhynchos) shown with 16S rRNA gene-based microbial community analysis.

    PubMed

    Maeda, Isamu; Siddiki, Mohammad Shohel Rana; Nozawa-Takeda, Tsutomu; Tsukahara, Naoki; Tani, Yuri; Naito, Taki; Sugita, Shoei

    2013-01-01

    Jungle Crows (Corvus macrorhynchos) prefer human habitats because of their versatility in feeding accompanied with human food consumption. Therefore, it is important from a public health viewpoint to characterize their intestinal microbiota. However, no studies have been involved in molecular characterization of the microbiota based on huge and reliable number of data acquisition. In this study, 16S rRNA gene-based microbial community analysis coupled with the next-generation DNA sequencing techniques was applied to the taxonomic classification of intestinal microbiome for three jungle crows. Clustering of the reads into 130 operational taxonomic units showed that at least 70% of analyzed sequences for each crow were highly homologous to Eimeria sp., which belongs to the protozoan phylum Apicomplexa. The microbiotas of three crows also contained potentially pathogenic bacteria with significant percentages, such as the genera Campylobacter and Brachyspira. Thus, the profiling of a large number of 16S rRNA gene sequences in crow intestinal microbiomes revealed the high-frequency existence or vestige of potentially pathogenic microorganisms.

  12. High-throughput sequencing of microbial community diversity in soil, grapes, leaves, grape juice and wine of grapevine from China

    PubMed Central

    Yan, Yin-zhuo; Zou, Wan; Ma, Wen-rui; Wang, Wei; Tian, Ge; Wang, Li-ye

    2018-01-01

    In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine. PMID:29565999

  13. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, S.K.

    1998-03-24

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.

  14. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, Stefan K.

    1998-01-01

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.

  15. UV-Visible Spectroscopy-Based Quantification of Unlabeled DNA Bound to Gold Nanoparticles.

    PubMed

    Baldock, Brandi L; Hutchison, James E

    2016-12-20

    DNA-functionalized gold nanoparticles have been increasingly applied as sensitive and selective analytical probes and biosensors. The DNA ligands bound to a nanoparticle dictate its reactivity, making it essential to know the type and number of DNA strands bound to the nanoparticle surface. Existing methods used to determine the number of DNA strands per gold nanoparticle (AuNP) require that the sequences be fluorophore-labeled, which may affect the DNA surface coverage and reactivity of the nanoparticle and/or require specialized equipment and other fluorophore-containing reagents. We report a UV-visible-based method to conveniently and inexpensively determine the number of DNA strands attached to AuNPs of different core sizes. When this method is used in tandem with a fluorescence dye assay, it is possible to determine the ratio of two unlabeled sequences of different lengths bound to AuNPs. Two sizes of citrate-stabilized AuNPs (5 and 12 nm) were functionalized with mixtures of short (5 base) and long (32 base) disulfide-terminated DNA sequences, and the ratios of sequences bound to the AuNPs were determined using the new method. The long DNA sequence was present as a lower proportion of the ligand shell than in the ligand exchange mixture, suggesting it had a lower propensity to bind the AuNPs than the short DNA sequence. The ratio of DNA sequences bound to the AuNPs was not the same for the large and small AuNPs, which suggests that the radius of curvature had a significant influence on the assembly of DNA strands onto the AuNPs.

  16. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  17. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  18. Integrated genomic classification of melanocytic tumors of the central nervous system using mutation analysis, copy number alterations and DNA methylation profiling.

    PubMed

    Griewank, Klaus; Koelsche, Christian; van de Nes, Johannes A P; Schrimpf, Daniel; Gessi, Marco; Möller, Inga; Sucker, Antje; Scolyer, Richard A; Buckland, Michael E; Murali, Rajmohan; Pietsch, Torsten; von Deimling, Andreas; Schadendorf, Dirk

    2018-06-11

    In the central nervous system, distinguishing primary leptomeningeal melanocytic tumors from melanoma metastases and predicting their biological behavior solely using histopathologic criteria can be challenging. We aimed to assess the diagnostic and prognostic value of integrated molecular analysis. Targeted next-generation-sequencing, array-based genome-wide methylation analysis and BAP1 immunohistochemistry was performed on the largest cohort of central nervous system melanocytic tumors analyzed to date, incl. 47 primary tumors of the central nervous system, 16 uveal melanomas. 13 cutaneous melanoma metastasis and 2 blue nevus-like melanomas. Gene mutation, DNA-methylation and copy-number profiles were correlated with clinicopathological features. Combining mutation, copy-number and DNA-methylation profiles clearly distinguished cutaneous melanoma metastases from other melanocytic tumors. Primary leptomeningeal melanocytic tumors, uveal melanomas and blue nevus-like melanoma showed common DNA-methylation, copy-number alteration and gene mutation signatures. Notably, tumors demonstrating chromosome 3 monosomy and BAP1 alterations formed a homogeneous subset within this group. Integrated molecular profiling aids in distinguishing primary from metastatic melanocytic tumors of the central nervous system. Primary leptomeningeal melanocytic tumors, uveal melanoma and blue nevus-like melanoma share molecular similarity with chromosome 3 and BAP1 alterations markers of poor prognosis. Copyright ©2018, American Association for Cancer Research.

  19. [DNA barcoding and its utility in commonly-used medicinal snakes].

    PubMed

    Huang, Yong; Zhang, Yue-yun; Zhao, Cheng-jian; Xu, Yong-li; Gu, Ying-le; Huang, Wen-qi; Lin, Kui; Li, Li

    2015-03-01

    Identification accuracy of traditional Chinese medicine is crucial for the traditional Chinese medicine research, production and application. DNA barcoding based on the mitochondrial gene coding for cytochrome c oxidase subunit I (COI), are more and more used for identification of traditional Chinese medicine. Using universal barcoding primers to sequence, we discussed the feasibility of DNA barcoding method for identification commonly-used medicinal snakes (a total of 109 samples belonging to 19 species 15 genera 6 families). The phylogenetic trees using Neighbor-joining were constructed. The results indicated that the mean content of G + C(46.5%) was lower than that of A + T (53.5%). As calculated by Kimera-2-parameter model, the mean intraspecies genetic distance of Trimeresurus albolabris, Ptyas dhumnades and Lycodon rufozonatus was greater than 2%. Further phylogenetic relationship results suggested that identification of one sample of T. albolabris was erroneous. The identification of some samples of P. dhumnades was also not correct, namely originally P. korros was identified as P. dhumnades. Factors influence on intraspecific genetic distance difference of L. rufozonatus need to be studied further. Therefore, DNA barcoding for identification of medicinal snakes is feasible, and greatly complements the morphological classification method. It is necessary to further study in identification of traditional Chinese medicine.

  20. Multiplex picoliter-droplet digital PCR for quantitative assessment of DNA integrity in clinical samples.

    PubMed

    Didelot, Audrey; Kotsopoulos, Steve K; Lupo, Audrey; Pekin, Deniz; Li, Xinyu; Atochin, Ivan; Srinivasan, Preethi; Zhong, Qun; Olson, Jeff; Link, Darren R; Laurent-Puig, Pierre; Blons, Hélène; Hutchison, J Brian; Taly, Valerie

    2013-05-01

    Assessment of DNA integrity and quantity remains a bottleneck for high-throughput molecular genotyping technologies, including next-generation sequencing. In particular, DNA extracted from paraffin-embedded tissues, a major potential source of tumor DNA, varies widely in quality, leading to unpredictable sequencing data. We describe a picoliter droplet-based digital PCR method that enables simultaneous detection of DNA integrity and the quantity of amplifiable DNA. Using a multiplex assay, we detected 4 different target lengths (78, 159, 197, and 550 bp). Assays were validated with human genomic DNA fragmented to sizes of 170 bp to 3000 bp. The technique was validated with DNA quantities as low as 1 ng. We evaluated 12 DNA samples extracted from paraffin-embedded lung adenocarcinoma tissues. One sample contained no amplifiable DNA. The fractions of amplifiable DNA for the 11 other samples were between 0.05% and 10.1% for 78-bp fragments and ≤1% for longer fragments. Four samples were chosen for enrichment and next-generation sequencing. The quality of the sequencing data was in agreement with the results of the DNA-integrity test. Specifically, DNA with low integrity yielded sequencing results with lower levels of coverage and uniformity and had higher levels of false-positive variants. The development of DNA-quality assays will enable researchers to downselect samples or process more DNA to achieve reliable genome sequencing with the highest possible efficiency of cost and effort, as well as minimize the waste of precious samples. © 2013 American Association for Clinical Chemistry.

  1. Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

    PubMed Central

    Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

    2012-01-01

    DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226

  2. Integrated sequencing of exome and mRNA of large-sized single cells.

    PubMed

    Wang, Lily Yan; Guo, Jiajie; Cao, Wei; Zhang, Meng; He, Jiankui; Li, Zhoufang

    2018-01-10

    Current approaches of single cell DNA-RNA integrated sequencing are difficult to call SNPs, because a large amount of DNA and RNA is lost during DNA-RNA separation. Here, we performed simultaneous single-cell exome and transcriptome sequencing on individual mouse oocytes. Using microinjection, we kept the nuclei intact to avoid DNA loss, while retaining the cytoplasm inside the cell membrane, to maximize the amount of DNA and RNA captured from the single cell. We then conducted exome-sequencing on the isolated nuclei and mRNA-sequencing on the enucleated cytoplasm. For single oocytes, exome-seq can cover up to 92% of exome region with an average sequencing depth of 10+, while mRNA-sequencing reveals more than 10,000 expressed genes in enucleated cytoplasm, with similar performance for intact oocytes. This approach provides unprecedented opportunities to study DNA-RNA regulation, such as RNA editing at single nucleotide level in oocytes. In future, this method can also be applied to other large cells, including neurons, large dendritic cells and large tumour cells for integrated exome and transcriptome sequencing.

  3. A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

    PubMed

    Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

    2015-04-11

    Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.

  4. Winnowing DNA for Rare Sequences: Highly Specific Sequence and Methylation Based Enrichment

    PubMed Central

    Thompson, Jason D.; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue. PMID:22355378

  5. Low-Energy Electron-Induced Strand Breaks in Telomere-Derived DNA Sequences-Influence of DNA Sequence and Topology.

    PubMed

    Rackwitz, Jenny; Bald, Ilko

    2018-03-26

    During cancer radiation therapy high-energy radiation is used to reduce tumour tissue. The irradiation produces a shower of secondary low-energy (<20 eV) electrons, which are able to damage DNA very efficiently by dissociative electron attachment. Recently, it was suggested that low-energy electron-induced DNA strand breaks strongly depend on the specific DNA sequence with a high sensitivity of G-rich sequences. Here, we use DNA origami platforms to expose G-rich telomere sequences to low-energy (8.8 eV) electrons to determine absolute cross sections for strand breakage and to study the influence of sequence modifications and topology of telomeric DNA on the strand breakage. We find that the telomeric DNA 5'-(TTA GGG) 2 is more sensitive to low-energy electrons than an intermixed sequence 5'-(TGT GTG A) 2 confirming the unique electronic properties resulting from G-stacking. With increasing length of the oligonucleotide (i.e., going from 5'-(GGG ATT) 2 to 5'-(GGG ATT) 4 ), both the variety of topology and the electron-induced strand break cross sections increase. Addition of K + ions decreases the strand break cross section for all sequences that are able to fold G-quadruplexes or G-intermediates, whereas the strand break cross section for the intermixed sequence remains unchanged. These results indicate that telomeric DNA is rather sensitive towards low-energy electron-induced strand breakage suggesting significant telomere shortening that can also occur during cancer radiation therapy. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    PubMed

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  7. Improved multiple displacement amplification (iMDA) and ultraclean reagents.

    PubMed

    Motley, S Timothy; Picuri, John M; Crowder, Chris D; Minich, Jeremiah J; Hofstadler, Steven A; Eshoo, Mark W

    2014-06-06

    Next-generation sequencing sample preparation requires nanogram to microgram quantities of DNA; however, many relevant samples are comprised of only a few cells. Genomic analysis of these samples requires a whole genome amplification method that is unbiased and free of exogenous DNA contamination. To address these challenges we have developed protocols for the production of DNA-free consumables including reagents and have improved upon multiple displacement amplification (iMDA). A specialized ethylene oxide treatment was developed that renders free DNA and DNA present within Gram positive bacterial cells undetectable by qPCR. To reduce DNA contamination in amplification reagents, a combination of ion exchange chromatography, filtration, and lot testing protocols were developed. Our multiple displacement amplification protocol employs a second strand-displacing DNA polymerase, improved buffers, improved reaction conditions and DNA free reagents. The iMDA protocol, when used in combination with DNA-free laboratory consumables and reagents, significantly improved efficiency and accuracy of amplification and sequencing of specimens with moderate to low levels of DNA. The sensitivity and specificity of sequencing of amplified DNA prepared using iMDA was compared to that of DNA obtained with two commercial whole genome amplification kits using 10 fg (~1-2 bacterial cells worth) of bacterial genomic DNA as a template. Analysis showed >99% of the iMDA reads mapped to the template organism whereas only 0.02% of the reads from the commercial kits mapped to the template. To assess the ability of iMDA to achieve balanced genomic coverage, a non-stochastic amount of bacterial genomic DNA (1 pg) was amplified and sequenced, and data obtained were compared to sequencing data obtained directly from genomic DNA. The iMDA DNA and genomic DNA sequencing had comparable coverage 99.98% of the reference genome at ≥1X coverage and 99.9% at ≥5X coverage while maintaining both balance and representation of the genome. The iMDA protocol in combination with DNA-free laboratory consumables, significantly improved the ability to sequence specimens with low levels of DNA. iMDA has broad utility in metagenomics, diagnostics, ancient DNA analysis, pre-implantation embryo screening, single-cell genomics, whole genome sequencing of unculturable organisms, and forensic applications for both human and microbial targets.

  8. Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation.

    PubMed

    Schneider, T D

    2001-12-01

    The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.

  9. Molecular design of sequence specific DNA alkylating agents.

    PubMed

    Minoshima, Masafumi; Bando, Toshikazu; Shinohara, Ken-ichi; Sugiyama, Hiroshi

    2009-01-01

    Sequence-specific DNA alkylating agents have great interest for novel approach to cancer chemotherapy. We designed the conjugates between pyrrole (Py)-imidazole (Im) polyamides and DNA alkylating chlorambucil moiety possessing at different positions. The sequence-specific DNA alkylation by conjugates was investigated by using high-resolution denaturing polyacrylamide gel electrophoresis (PAGE). The results showed that polyamide chlorambucil conjugates alkylate DNA at flanking adenines in recognition sequences of Py-Im polyamides, however, the reactivities and alkylation sites were influenced by the positions of conjugation. In addition, we synthesized conjugate between Py-Im polyamide and another alkylating agent, 1-(chloromethyl)-5-hydroxy-1,2-dihydro-3H-benz[e]indole (seco-CBI). DNA alkylation reactivies by both alkylating polyamides were almost comparable. In contrast, cytotoxicities against cell lines differed greatly. These comparative studies would promote development of appropriate sequence-specific DNA alkylating polyamides against specific cancer cells.

  10. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  11. Utility of 16S rDNA Sequencing for Identification of Rare Pathogenic Bacteria.

    PubMed

    Loong, Shih Keng; Khor, Chee Sieng; Jafar, Faizatul Lela; AbuBakar, Sazaly

    2016-11-01

    Phenotypic identification systems are established methods for laboratory identification of bacteria causing human infections. Here, the utility of phenotypic identification systems was compared against 16S rDNA identification method on clinical isolates obtained during a 5-year study period, with special emphasis on isolates that gave unsatisfactory identification. One hundred and eighty-seven clinical bacteria isolates were tested with commercial phenotypic identification systems and 16S rDNA sequencing. Isolate identities determined using phenotypic identification systems and 16S rDNA sequencing were compared for similarity at genus and species level, with 16S rDNA sequencing as the reference method. Phenotypic identification systems identified ~46% (86/187) of the isolates with identity similar to that identified using 16S rDNA sequencing. Approximately 39% (73/187) and ~15% (28/187) of the isolates showed different genus identity and could not be identified using the phenotypic identification systems, respectively. Both methods succeeded in determining the species identities of 55 isolates; however, only ~69% (38/55) of the isolates matched at species level. 16S rDNA sequencing could not determine the species of ~20% (37/187) of the isolates. The 16S rDNA sequencing is a useful method over the phenotypic identification systems for the identification of rare and difficult to identify bacteria species. The 16S rDNA sequencing method, however, does have limitation for species-level identification of some bacteria highlighting the need for better bacterial pathogen identification tools. © 2016 Wiley Periodicals, Inc.

  12. Towards optimized methods to study viral impacts on soil microbial carbon cycling

    NASA Astrophysics Data System (ADS)

    Trubl, G. G.; Roux, S.; Jang, H. B.; Solonenko, N.; Sullivan, M. B.; Rich, V. I.

    2016-12-01

    Permafrost contains 50% of global soil carbon and is rapidly thawing. While the fate of this carbon is currently unknown, it will undoubtedly be shaped by microbes and their associated viruses, which modulate host activities via mortality and metabolic control. However, little is known about soil viruses generally and their impact on terrestrial biogeochemistry; this is partially due to the presence of inhibitory substances (e.g. humic acids) in soils that interfere with sample processing and sequence-based metagenomics surveys. To address this problem, we examined viral populations in three different peat soils along a permafrost thaw gradient. These samples yielded low viral DNA recoveries, and shallow metagenomic sequencing, but still resulted in the recovery of 40 viral genome fragments. Genome- and network-based classification suggested that these new references represented 11 viral clusters, and ecological patterns (based upon non-redundant fragment recruitment) showed that viral populations were distinct in each habitat. Although only 31% of the genes could be functionally classified, pairwise genome comparisons classified 63% of the viruses taxonomically. Additionally, comparison of the 40 viral genome fragments to 53 previously recovered fragments from the same site showed no overlap, suggesting only a small portion of the resident viral community has been sampled. A follow-up experiment was performed to remove more humics during extraction and thereby obtain better viral metagenomes. Three DNA extraction protocols were tested (CTAB, PowerSoil, and Wizard columns) and the DNA was further purified with an AMPure clean-up. The PowerSoil kit maximized DNA yield (3x CTAB and 6x Wizard), and yielded the purest DNA (based on NanoDrop 260:230 ratio). Given the important roles of viruses in biogeochemical cycles in better-studied systems, further research and humic-removal optimization on these thawing permafrost-associated viral communities is needed to clarify their involvement in carbon cycle feedbacks.

  13. DNA barcoding for molecular identification of Demodex based on mitochondrial genes.

    PubMed

    Hu, Li; Yang, YuanJun; Zhao, YaE; Niu, DongLing; Yang, Rui; Wang, RuiLing; Lu, Zhaohui; Li, XiaoQi

    2017-12-01

    There has been no widely accepted DNA barcode for species identification of Demodex. In this study, we attempted to solve this issue. First, mitochondrial cox1-5' and 12S gene fragments of Demodex folloculorum, D. brevis, D. canis, and D. caprae were amplified, cloned, and sequenced for the first time; intra/interspecific divergences were computed and phylogenetic trees were reconstructed. Then, divergence frequency distribution plots of those two gene fragments were drawn together with mtDNA cox1-middle region and 16S obtained in previous studies. Finally, their identification efficiency was evaluated by comparing barcoding gap. Results indicated that 12S had the higher identification efficiency. Specifically, for cox1-5' region of the four Demodex species, intraspecific divergences were less than 2.0%, and interspecific divergences were 21.1-31.0%; for 12S, intraspecific divergences were less than 1.4%, and interspecific divergences were 20.8-26.9%. The phylogenetic trees demonstrated that the four Demodex species clustered separately, and divergence frequency distribution plot showed that the largest intraspecific divergence of 12S (1.4%) was less than cox1-5' region (2.0%), cox1-middle region (3.1%), and 16S (2.8%). The barcoding gap of 12S was 19.4%, larger than cox1-5' region (19.1%), cox1-middle region (11.3%), and 16S (13.0%); the interspecific divergence span of 12S was 6.2%, smaller than cox1-5' region (10.0%), cox1-middle region (14.1%), and 16S (11.4%). Moreover, 12S has a moderate length (517 bp) for sequencing at once. Therefore, we proposed mtDNA 12S was more suitable than cox1 and 16S to be a DNA barcode for classification and identification of Demodex at lower category level.

  14. Nudiviruses and other large, double-stranded circular DNA viruses of invertebrates: new insights on an old topic.

    PubMed

    Wang, Yongjie; Jehle, Johannes A

    2009-07-01

    Nudiviruses (NVs) are a highly diverse group of large, circular dsDNA viruses pathogenic for invertebrates. They have rod-shaped and enveloped nucleocapsids, replicate in the nucleus of infected host cells, and possess interesting biological and molecular properties. The unassigned viral genus Nudivirus has been proposed for classification of nudiviruses. Currently, the nudiviruses comprise five different viruses: the palm rhinoceros beetle virus (Oryctes rhinoceros NV, OrNV), the Hz-1 virus (Heliothis zea NV-1, HzNV-1), the cricket virus (Gryllus bimaculatus NV, GbNV), the corn earworm moth Hz-2 virus (HzNV-2), and the occluded shrimp Monodon Baculovirus reassigned as Penaeus monodon NV (PmNV). Thus far, the genomes of OrNV, GbNV, HzNV-1 and HzNV-2 have been completely sequenced. They vary between 97 and 230kbp in size and encode between 98 and 160 open reading frames (ORFs). All sequenced nudiviruses have 33 ORFs in common. Strikingly, 20 of them are homologous to baculovirus core genes involved in RNA transcription, DNA replication, virion structural components and other functions. Another nine conserved ORFs are likely associated with DNA replication, repair and recombination, and nucleotide metabolism; one is homologous to baculovirus iap-3 gene; two are nudivirus-specific ORFs of unknown function. Interestingly, one nudivirus ORF is similar to polh/gran gene, encoding occlusion body protein matrix and being conserved in Alpha- Beta- and Gammabaculoviruses. Members of nudiviruses are closely related and form a monophyletic group consisting of two sister clades of OrNV/GbNV and HzNVs/PmNV. It is proposed that nudiviruses and baculoviruses derived from a common ancestor and are evolutionarily related to other large DNA viruses such as the insect-specific salivary gland hypertrophy virus (SGHV) and the marine white spot syndrome virus (WSSV).

  15. Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism.

    PubMed

    van der Meulen, Sjoerd B; de Jong, Anne; Kok, Jan

    2016-01-01

    RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.

  16. Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

    PubMed Central

    Zook, Justin M.; Morrow, Jayne B.; Lin, Nancy J.

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods. PMID:28924496

  17. Modestobacter caceresii sp. nov., novel actinobacteria with an insight into their adaptive mechanisms for survival in extreme hyper-arid Atacama Desert soils.

    PubMed

    Busarakam, Kanungnid; Bull, Alan T; Trujillo, Martha E; Riesco, Raul; Sangal, Vartul; van Wezel, Gilles P; Goodfellow, Michael

    2016-06-01

    A polyphasic study was designed to determine the taxonomic provenance of three Modestobacter strains isolated from an extreme hyper-arid Atacama Desert soil. The strains, isolates KNN 45-1a, KNN 45-2b(T) and KNN 45-3b, were shown to have chemotaxonomic and morphological properties in line with their classification in the genus Modestobacter. The isolates had identical 16S rRNA gene sequences and formed a branch in the Modestobacter gene tree that was most closely related to the type strain of Modestobacter marinus (99.6% similarity). All three isolates were distinguished readily from Modestobacter type strains by a broad range of phenotypic properties, by qualitative and quantitative differences in fatty acid profiles and by BOX fingerprint patterns. The whole genome sequence of isolate KNN 45-2b(T) showed 89.3% average nucleotide identity, 90.1% (SD: 10.97%) average amino acid identity and a digital DNA-DNA hybridization value of 42.4±3.1 against the genome sequence of M. marinus DSM 45201(T), values consistent with its assignment to a separate species. On the basis of all of these data, it is proposed that the isolates be assigned to the genus Modestobacter as Modestobacter caceresii sp. nov. with isolate KNN 45-2b(T) (CECT 9023(T)=DSM 101691(T)) as the type strain. Analysis of the whole-genome sequence of M. caceresii KNN 45-2b(T), with 4683 open reading frames and a genome size of ∽4.96Mb, revealed the presence of genes and gene-clusters that encode for properties relevant to its adaptability to harsh environmental conditions prevalent in extreme hyper arid Atacama Desert soils. Copyright © 2016. Published by Elsevier GmbH.

  18. Novel numerical and graphical representation of DNA sequences and proteins.

    PubMed

    Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

    2006-12-01

    We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.

  19. Capillary electrophoresis of Big-Dye terminator sequencing reactions for human mtDNA Control Region haplotyping in the identification of human remains.

    PubMed

    Montesino, Marta; Prieto, Lourdes

    2012-01-01

    Cycle sequencing reaction with Big-Dye terminators provides the methodology to analyze mtDNA Control Region amplicons by means of capillary electrophoresis. DNA sequencing with ddNTPs or terminators was developed by (1). The progressive automation of the method by combining the use of fluorescent-dye terminators with cycle sequencing has made it possible to increase the sensibility and efficiency of the method and hence has allowed its introduction into the forensic field. PCR-generated mitochondrial DNA products are the templates for sequencing reactions. Different set of primers can be used to generate amplicons with different sizes according to the quality and quantity of the DNA extract providing sequence data for different ranges inside the Control Region.

  20. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

Top