DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information
ERIC Educational Resources Information Center
McCallister, Gary
2005-01-01
The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)
Schmidt-Chanasit, Jonas; Bialonski, Alexandra; Heinemann, Patrick; Ulrich, Rainer G; Günther, Stephan; Rabenau, Holger F; Doerr, Hans Wilhelm
2010-07-01
Recently two different herpes simplex virus type 2 (HSV-2) clades (A and B) were described on DNA sequence data of the glycoprotein E (gE), G (gG) and I (gI) genes. To type the circulating HSV-2 wild-type strains in Germany by a novel approach and to monitor potential changes in the molecular epidemiology between 1997 and 2008. A total of 64 clinical HSV-2 isolates were analyzed by a novel approach using the DNA sequences of the complete open reading frames of glycoprotein B (gB) and gG. Recombination analysis of the gB and gG gene sequences was performed to reveal intragenic recombinants. Based on the phylogenetic analysis of the gB coding DNA sequence 8 of 64 (12%) isolates were classified as clade A strains and 56 of 64 (88%) isolates were classified as clade B strains. Analysis of the gG coding DNA sequence classified 4 (6%) isolates as clade A strains and 60 (94%) isolates as clade B strains. In comparison, the 8 isolates classified as clade A strains using the gB sequence data were classified as clade B strains when using the gG coding DNA sequence, suggesting intergenic recombination events. Intragenic recombination events were not detected. The first molecular survey of clinical HSV-2 isolates from Germany demonstrated the circulation of clade A and B strains and of intergenic recombinants over a period of 12 years. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.
Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José
2015-05-01
Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.
DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.
Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin
2012-01-01
Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.
DNA Fingerprinting of Chinese Melon Provides Evidentiary Support of Seed Quality Appraisal
Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin
2012-01-01
Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications. PMID:23285039
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.
Hua, Wei; Wang, Jiasong; Zhao, Jian
2014-01-01
Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data.
Yang, Yang; Niehaus, Katherine E; Walker, Timothy M; Iqbal, Zamin; Walker, A Sarah; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W; Smith, E Grace; Zhu, Tingting; Clifton, David A
2018-05-15
Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4-8% for other drugs (P < 0.01). The details of source code are provided at http://www.robots.ox.ac.uk/~davidc/code.php. david.clifton@eng.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin
2013-01-01
DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data
Yang, Yang; Niehaus, Katherine E; Walker, Timothy M; Iqbal, Zamin; Walker, A Sarah; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W; Smith, E Grace; Zhu, Tingting; Clifton, David A
2018-01-01
Abstract Motivation Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Summary Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Results Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4–8% for other drugs (P < 0.01). Availability and implementation The details of source code are provided at http://www.robots.ox.ac.uk/~davidc/code.php. Contact david.clifton@eng.ox.ac.uk Supplementary information Supplementary data are available at Bioinformatics online. PMID:29240876
The changing epitome of species identification – DNA barcoding
Ajmal Ali, M.; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M.A.; Pandey, Arun K.; Lee, Joongku
2014-01-01
The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The ‘DNA barcodes’ show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007
OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites.
Shazman, Shula; Lee, Hunjoong; Socol, Yakov; Mann, Richard S; Honig, Barry
2014-01-01
We present OnTheFly (http://bhapp.c2b2.columbia.edu/OnTheFly/index.php), a database comprising a systematic collection of transcription factors (TFs) of Drosophila melanogaster and their DNA-binding sites. TFs predicted in the Drosophila melanogaster genome are annotated and classified and their structures, obtained via experiment or homology models, are provided. All known preferred TF DNA-binding sites obtained from the B1H, DNase I and SELEX methodologies are presented. DNA shape parameters predicted for these sites are obtained from a high throughput server or from crystal structures of protein-DNA complexes where available. An important feature of the database is that all DNA-binding domains and their binding sites are fully annotated in a eukaryote using structural criteria and evolutionary homology. OnTheFly thus provides a comprehensive view of TFs and their binding sites that will be a valuable resource for deciphering non-coding regulatory DNA.
Koi, Minoru; Tseng-Rogenski, Stephanie S; Carethers, John M
2018-01-15
Microsatellite alterations within genomic DNA frameshift as a result of defective DNA mismatch repair (MMR). About 15% of sporadic colorectal cancers (CRCs) manifest hypermethylation of the DNA MMR gene MLH1 , resulting in mono- and di-nucleotide frameshifts to classify it as microsatellite instability-high (MSI-H) and hypermutated, and due to frameshifts at coding microsatellites generating neo-antigens, produce a robust protective immune response that can be enhanced with immune checkpoint blockade. More commonly, approximately 50% of sporadic non-MSI-H CRCs demonstrate frameshifts at di- and tetra-nucleotide microsatellites to classify it as MSI-low/elevated microsatellite alterations at selected tetranucleotide repeats (EMAST) as a result of functional somatic inactivation of the DNA MMR protein MSH3 via a nuclear-to-cytosolic displacement. The trigger for MSH3 displacement appears to be inflammation and/or oxidative stress, and unlike MSI-H CRC patients, patients with MSI-L/EMAST CRCs show poor prognosis. These inflammatory-associated microsatellite alterations are a consequence of the local tumor microenvironment, and in theory, if the microenvironment is manipulated to lower inflammation, the microsatellite alterations and MSH3 dysfunction should be corrected. Here we describe the mechanisms and significance of inflammatory-associated microsatellite alterations, and propose three areas to deeply explore the consequences and prevention of inflammation's effect upon the DNA MMR system.
Koi, Minoru; Tseng-Rogenski, Stephanie S; Carethers, John M
2018-01-01
Microsatellite alterations within genomic DNA frameshift as a result of defective DNA mismatch repair (MMR). About 15% of sporadic colorectal cancers (CRCs) manifest hypermethylation of the DNA MMR gene MLH1, resulting in mono- and di-nucleotide frameshifts to classify it as microsatellite instability-high (MSI-H) and hypermutated, and due to frameshifts at coding microsatellites generating neo-antigens, produce a robust protective immune response that can be enhanced with immune checkpoint blockade. More commonly, approximately 50% of sporadic non-MSI-H CRCs demonstrate frameshifts at di- and tetra-nucleotide microsatellites to classify it as MSI-low/elevated microsatellite alterations at selected tetranucleotide repeats (EMAST) as a result of functional somatic inactivation of the DNA MMR protein MSH3 via a nuclear-to-cytosolic displacement. The trigger for MSH3 displacement appears to be inflammation and/or oxidative stress, and unlike MSI-H CRC patients, patients with MSI-L/EMAST CRCs show poor prognosis. These inflammatory-associated microsatellite alterations are a consequence of the local tumor microenvironment, and in theory, if the microenvironment is manipulated to lower inflammation, the microsatellite alterations and MSH3 dysfunction should be corrected. Here we describe the mechanisms and significance of inflammatory-associated microsatellite alterations, and propose three areas to deeply explore the consequences and prevention of inflammation’s effect upon the DNA MMR system. PMID:29375743
Robles, Ana I.; Arai, Eri; Mathé, Ewy A.; Okayama, Hirokazu; Schetter, Aaron J.; Brown, Derek; Petersen, David; Bowman, Elise D.; Noro, Rintaro; Welsh, Judith A.; Edelman, Daniel C.; Stevenson, Holly S.; Wang, Yonghong; Tsuchiya, Naoto; Kohno, Takashi; Skaug, Vidar; Mollerup, Steen; Haugen, Aage; Meltzer, Paul S.; Yokota, Jun; Kanai, Yae
2015-01-01
Introduction Up to 30% Stage I lung cancer patients suffer recurrence within 5 years of curative surgery. We sought to improve existing protein-coding gene and microRNA expression prognostic classifiers by incorporating epigenetic biomarkers. Methods Genome-wide screening of DNA methylation and pyrosequencing analysis of HOXA9 promoter methylation were performed in two independently collected cohorts of Stage I lung adenocarcinoma. The prognostic value of HOXA9 promoter methylation alone and in combination with mRNA and miRNA biomarkers was assessed by Cox regression and Kaplan-Meier survival analysis in both cohorts. Results Promoters of genes marked by Polycomb in Embryonic Stem Cells were methylated de novo in tumors and identified patients with poor prognosis. The HOXA9 locus was methylated de novo in Stage I tumors (P < 0.0005). High HOXA9 promoter methylation was associated with worse cancer-specific survival (Hazard Ratio [HR], 2.6; P = 0.02) and recurrence-free survival (HR, 3.0; P = 0.01), and identified high-risk patients in stratified analysis of Stage IA and IB. Four protein-coding gene (XPO1, BRCA1, HIF1α, DLC1), miR-21 expression and HOXA9 promoter methylation were each independently associated with outcome (HR, 2.8; P = 0.002; HR, 2.3; P = 0.01; and HR, 2.4; P = 0.005, respectively), and, when combined, identified high-risk, therapy naïve, Stage I patients (HR, 10.2; P = 3x10−5). All associations were confirmed in two independently collected cohorts. Conclusion A prognostic classifier comprising three types of genomic and epigenomic data may help guide the postoperative management of Stage I lung cancer patients at high risk of recurrence. PMID:26134223
Pompei, Fiorenza; Ciminelli, Bianca Maria; Bombieri, Cristina; Ciccacci, Cinzia; Koudova, Monika; Giorgi, Silvia; Belpinati, Francesca; Begnini, Angela; Cerny, Milos; Des Georges, Marie; Claustres, Mireille; Ferec, Claude; Macek, Milan; Modiano, Guido; Pignatti, Pier Franco
2006-01-01
An average of about 1700 CFTR (cystic fibrosis transmembrane conductance regulator) alleles from normal individuals from different European populations were extensively screened for DNA sequence variation. A total of 80 variants were observed: 61 coding SNSs (results already published), 13 noncoding SNSs, three STRs, two short deletions, and one nucleotide insertion. Eight DNA variants were classified as non-CF causing due to their high frequency of occurrence. Through this survey the CFTR has become the most exhaustively studied gene for its coding sequence variability and, though to a lesser extent, for its noncoding sequence variability as well. Interestingly, most variation was associated with the M470 allele, while the V470 allele showed an 'extended haplotype homozygosity' (EHH). These findings make us suggest a role for selection acting either on the M470V itself or through an hitchhiking mechanism involving a second site. The possible ancient origin of the V allele in an 'out of Africa' time frame is discussed.
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-01-01
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-05-24
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.
The Diversity Present in 5140 Human Mitochondrial Genomes
Pereira, Luísa; Freitas, Fernando; Fernandes, Verónica; Pereira, Joana B.; Costa, Marta D.; Costa, Stephanie; Máximo, Valdemar; Macaulay, Vincent; Rocha, Ricardo; Samuels, David C.
2009-01-01
We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition. PMID:19426953
Statistical approaches to account for false-positive errors in environmental DNA samples.
Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid
2016-05-01
Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies. © 2015 John Wiley & Sons Ltd.
DNA barcode goes two-dimensions: DNA QR code web server.
Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server
Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
Informative priors based on transcription factor structural class improve de novo motif discovery.
Narlikar, Leelavati; Gordân, Raluca; Ohler, Uwe; Hartemink, Alexander J
2006-07-15
An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm. Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites. Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
Superimposed Code Theoretic Analysis of DNA Codes and DNA Computing
2008-01-01
complements of one another and the DNA duplex formed is a Watson - Crick (WC) duplex. However, there are many instances when the formation of non-WC...that the user’s requirements for probe selection are met based on the Watson - Crick probe locality within a target. The second type, called...AFRL-RI-RS-TR-2007-288 Final Technical Report January 2008 SUPERIMPOSED CODE THEORETIC ANALYSIS OF DNA CODES AND DNA COMPUTING
Genes and Pathways Involved in Adult Onset Disorders Featuring Muscle Mitochondrial DNA Instability
Ahmed, Naghia; Ronchi, Dario; Comi, Giacomo Pietro
2015-01-01
Replication and maintenance of mtDNA entirely relies on a set of proteins encoded by the nuclear genome, which include members of the core replicative machinery, proteins involved in the homeostasis of mitochondrial dNTPs pools or deputed to the control of mitochondrial dynamics and morphology. Mutations in their coding genes have been observed in familial and sporadic forms of pediatric and adult-onset clinical phenotypes featuring mtDNA instability. The list of defects involved in these disorders has recently expanded, including mutations in the exo-/endo-nuclease flap-processing proteins MGME1 and DNA2, supporting the notion that an enzymatic DNA repair system actively takes place in mitochondria. The results obtained in the last few years acknowledge the contribution of next-generation sequencing methods in the identification of new disease loci in small groups of patients and even single probands. Although heterogeneous, these genes can be conveniently classified according to the pathway to which they belong. The definition of the molecular and biochemical features of these pathways might be helpful for fundamental knowledge of these disorders, to accelerate genetic diagnosis of patients and the development of rational therapies. In this review, we discuss the molecular findings disclosed in adult patients with muscle pathology hallmarked by mtDNA instability. PMID:26251896
Statistical properties of DNA sequences
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.
1995-01-01
We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Qiu, Guo-Hua
2016-01-01
In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.
An overview of the structures of protein-DNA complexes
Luscombe, Nicholas M; Austin, Susan E; Berman , Helen M; Thornton, Janet M
2000-01-01
On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes. PMID:11104519
ANN modeling of DNA sequences: new strategies using DNA shape code.
Parbhane, R V; Tambe, S S; Kulkarni, B D
2000-09-01
Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.
DNA rearrangements directed by non-coding RNAs in ciliates
Mochizuki, Kazufumi
2013-01-01
Extensive programmed rearrangement of DNA, including DNA elimination, chromosome fragmentation, and DNA descrambling, takes place in the newly developed macronucleus during the sexual reproduction of ciliated protozoa. Recent studies have revealed that two distant classes of ciliates use distinct types of non-coding RNAs to regulate such DNA rearrangement events. DNA elimination in Tetrahymena is regulated by small non-coding RNAs that are produced and utilized in an RNAi-related process. It has been proposed that the small RNAs produced from the micronuclear genome are used to identify eliminated DNA sequences by whole-genome comparison between the parental macronucleus and the micronucleus. In contrast, DNA descrambling in Oxytricha is guided by long non-coding RNAs that are produced from the parental macronuclear genome. These long RNAs are proposed to act as templates for the direct descrambling events that occur in the developing macronucleus. Both cases provide useful examples to study epigenetic chromatin regulation by non-coding RNAs. PMID:21956937
Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin
2009-01-01
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
Computational fishing of new DNA methyltransferase inhibitors from natural products.
Maldonado-Rojas, Wilson; Olivero-Verbel, Jesus; Marrero-Ponce, Yovani
2015-07-01
DNA methyltransferase inhibitors (DNMTis) have become an alternative for cancer therapies. However, only two DNMTis have been approved as anticancer drugs, although with some restrictions. Natural products (NPs) are a promising source of drugs. In order to find NPs with novel chemotypes as DNMTis, 47 compounds with known activity against these enzymes were used to build a LDA-based QSAR model for active/inactive molecules (93% accuracy) based on molecular descriptors. This classifier was employed to identify potential DNMTis on 800 NPs from NatProd Collection. 447 selected compounds were docked on two human DNA methyltransferase (DNMT) structures (PDB codes: 3SWR and 2QRV) using AutoDock Vina and Surflex-Dock, prioritizing according to their score values, contact patterns at 4 Å and molecular diversity. Six consensus NPs were identified as virtual hits against DNMTs, including 9,10-dihydro-12-hydroxygambogic, phloridzin, 2',4'-dihydroxychalcone 4'-glucoside, daunorubicin, pyrromycin and centaurein. This method is an innovative computational strategy for identifying DNMTis, useful in the identification of potent and selective anticancer drugs. Copyright © 2015 Elsevier Inc. All rights reserved.
2013-01-01
Background Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process. Results For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes. In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60–100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range. The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF. Conclusions It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species. Reviewers This article was reviewed by Alexey Kondrashov, Rajeev Azad (nominated by Dr J.Peter Gogarten) and Yuriy Fofanov (nominated by Dr Janet Siefert). PMID:24067167
Introduction to the Natural Anticipator and the Artificial Anticipator
NASA Astrophysics Data System (ADS)
Dubois, Daniel M.
2010-11-01
This short communication deals with the introduction of the concept of anticipator, which is one who anticipates, in the framework of computing anticipatory systems. The definition of anticipation deals with the concept of program. Indeed, the word program, comes from "pro-gram" meaning "to write before" by anticipation, and means a plan for the programming of a mechanism, or a sequence of coded instructions that can be inserted into a mechanism, or a sequence of coded instructions, as genes or behavioural responses, that is part of an organism. Any natural or artificial programs are thus related to anticipatory rewriting systems, as shown in this paper. All the cells in the body, and the neurons in the brain, are programmed by the anticipatory genetic code, DNA, in a low-level language with four signs. The programs in computers are also computing anticipatory systems. It will be shown, at one hand, that the genetic code DNA is a natural anticipator. As demonstrated by Nobel laureate McClintock [8], genomes are programmed. The fundamental program deals with the DNA genetic code. The properties of the DNA consist in self-replication and self-modification. The self-replicating process leads to reproduction of the species, while the self-modifying process leads to new species or evolution and adaptation in existing ones. The genetic code DNA keeps its instructions in memory in the DNA coding molecule. The genetic code DNA is a rewriting system, from DNA coding to DNA template molecule. The DNA template molecule is a rewriting system to the Messenger RNA molecule. The information is not destroyed during the execution of the rewriting program. On the other hand, it will be demonstrated that Turing machine is an artificial anticipator. The Turing machine is a rewriting system. The head reads and writes, modifying the content of the tape. The information is destroyed during the execution of the program. This is an irreversible process. The input data are lost.
Fricova, Dominika; Valach, Matus; Farkas, Zoltan; Pfeiffer, Ilona; Kucsera, Judit; Tomaska, Lubomir; Nosek, Jozef
2010-01-01
As a part of our initiative aimed at a large-scale comparative analysis of fungal mitochondrial genomes, we determined the complete DNA sequence of the mitochondrial genome of the yeast Candida subhashii and found that it exhibits a number of peculiar features. First, the mitochondrial genome is represented by linear dsDNA molecules of uniform length (29 795 bp), with an unusually high content of guanine and cytosine residues (52.7 %). Second, the coding sequences lack introns; thus, the genome has a relatively compact organization. Third, the termini of the linear molecules consist of long inverted repeats and seem to contain a protein covalently bound to terminal nucleotides at the 5′ ends. This architecture resembles the telomeres in a number of linear viral and plasmid DNA genomes classified as invertrons, in which the terminal proteins serve as specific primers for the initiation of DNA synthesis. Finally, although the mitochondrial genome of C. subhashii contains essentially the same set of genes as other closely related pathogenic Candida species, we identified additional ORFs encoding two homologues of the family B protein-priming DNA polymerases and an unknown protein. The terminal structures and the genes for DNA polymerases are reminiscent of linear mitochondrial plasmids, indicating that this genome architecture might have emerged from fortuitous recombination between an ancestral, presumably circular, mitochondrial genome and an invertron-like element. PMID:20395267
NASA Astrophysics Data System (ADS)
Markman, Adam; Carnicer, Artur; Javidi, Bahram
2017-05-01
We overview our recent work [1] on utilizing three-dimensional (3D) optical phase codes for object authentication using the random forest classifier. A simple 3D optical phase code (OPC) is generated by combining multiple diffusers and glass slides. This tag is then placed on a quick-response (QR) code, which is a barcode capable of storing information and can be scanned under non-uniform illumination conditions, rotation, and slight degradation. A coherent light source illuminates the OPC and the transmitted light is captured by a CCD to record the unique signature. Feature extraction on the signature is performed and inputted into a pre-trained random-forest classifier for authentication.
Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A
2011-04-01
DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.
Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge
2013-01-01
This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392
Decoding DNA labels by melting curve analysis using real-time PCR.
Balog, József A; Fehér, Liliána Z; Puskás, László G
2017-12-01
Synthetic DNA has been used as an authentication code for a diverse number of applications. However, existing decoding approaches are based on either DNA sequencing or the determination of DNA length variations. Here, we present a simple alternative protocol for labeling different objects using a small number of short DNA sequences that differ in their melting points. Code amplification and decoding can be done in two steps using quantitative PCR (qPCR). To obtain a DNA barcode with high complexity, we defined 8 template groups, each having 4 different DNA templates, yielding 158 (>2.5 billion) combinations of different individual melting temperature (Tm) values and corresponding ID codes. The reproducibility and specificity of the decoding was confirmed by using the most complex template mixture, which had 32 different products in 8 groups with different Tm values. The industrial applicability of our protocol was also demonstrated by labeling a drone with an oil-based paint containing a predefined DNA code, which was then successfully decoded. The method presented here consists of a simple code system based on a small number of synthetic DNA sequences and a cost-effective, rapid decoding protocol using a few qPCR reactions, enabling a wide range of authentication applications.
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.
Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao
2017-07-01
In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Isolation of a DNA Probe for Lactobacillus curvatus
Petrick, Hendrik A. R.; Ambrosio, Riccardo E.; Holzapfel, Wilhelm H.
1988-01-01
A genomic library of Lactobacillus curvatus DSM 20019 was constructed in bacteriophage λ gt11. A 1.2-kilobase DNA probe specific for L. curvatus was isolated from this library. When this probe was hybridized to DNA from Lactobacillus isolates from different sources classified by conventional techniques, differing degrees of hybridization were obtained. This could imply that these isolates may have been incorrectly classified. Images PMID:16347554
What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?
NASA Astrophysics Data System (ADS)
Liebovitch, Larry
1998-03-01
The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.
Conrad, Cheyenne C; Gilroyed, Brandon H; McAllister, Tim A; Reuter, Tim
2012-10-01
Non-O157 Shiga toxin producing Escherichia coli (STEC) are gaining recognition as human pathogens, but no standardized method exists to identify them. Sequence analysis revealed that STEC can be classified on the base of variable O antigen regions into different O serotypes. Polymerase chain reaction is a powerful technique for thorough screening and complex diagnosis for these pathogens, but requires a positive control to verify qualitative and/or quantitative DNA-fragment amplification. Due to the pathogenic nature of STEC, controls are not readily available and cell culturing of STEC reference strains requires biosafety conditions of level 2 or higher. In order to bypass this limitation, controls of stacked O-type specific DNA-fragments coding for primer recognition sites were designed to screen for nine STEC serotypes frequently associated with human infection. The synthetic controls were amplified by PCR, cloned into a plasmid vector and transferred into bacteria host cells. Plasmids amplified by bacterial expression were purified, serially diluted and tested as standards for real-time PCR using SYBR Green and TaqMan assays. Utility of synthetic DNA controls was demonstrated in conventional and real-time PCR assays and validated with DNA from natural STEC strains. Copyright © 2012 Elsevier B.V. All rights reserved.
Data Compression Techniques for Maps
1989-01-01
Lempel - Ziv compression is applied to the classified and unclassified images as also to the output of the compression algorithms . The algorithms ...resulted in a compression of 7:1. The output of the quadtree coding algorithm was then compressed using Lempel - Ziv coding. The compression ratio achieved...using Lempel - Ziv coding. The unclassified image gave a compression ratio of only 1.4:1. The K means classified image
Characterization and machine learning prediction of allele-specific DNA methylation.
He, Jianlin; Sun, Ming-an; Wang, Zhong; Wang, Qianfei; Li, Qing; Xie, Hehuang
2015-12-01
A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events. Published by Elsevier Inc.
Utilization of an Academic Nursing Center.
ERIC Educational Resources Information Center
Cole, Frank L.; Mackey, Thomas
1996-01-01
Using data from an academic nursing center that cared for 3,263 patients over eight months, diseases were classified using International Classification of Diseases codes, and procedures were classified using Current Procedural Terminology codes. Patterns of health care emerged, with implications for clinical teaching. (SK)
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).
Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai
2014-12-01
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Kawano, Tomonori
2013-03-01
There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Dong, Chen; Hu, Huigang; Xie, Jianghui
2016-12-01
DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.
Novel human CRYGD rare variant in a Brazilian family with congenital cataract
Giordano, Gabriel Gorgone; Tavares, Anderson; da Silva, Márcio José; de Vasconcellos, José Paulo Cabral; Arieta, Carlos Eduardo Leite; de Melo, Mônica Barbosa
2011-01-01
Purpose To describe a novel polymorphism in the γD-crystallin (CRYGD) gene in a Brazilian family with congenital cataract. Methods A Brazilian four-generation family was analyzed. The proband had bilateral lamellar cataract and the phenotypes were classified by slit lamp examination. Genomic DNA was extracted from peripheral blood and coding regions and intron/exon boundaries of the αA-crystallin (CRYAA), γC-crystallin (CRYGC), and CRYGD genes were amplified by polymerase chain reaction and directly sequenced. Results Sequencing of the coding regions of CRYGD showed the presence of a heterozygous A→G transversion at c.401 position, which results in the substitution of a tyrosine to a cysteine (Y134C). The polymorphism was identified in three individuals, two affected and one unaffected. Conclusions A novel rare variant in CRYGD (Y134C) was detected in a Brazilian family with congenital cataract. Because there is no segregation between the substitution and the phenotypes in this family, other genetic alterations are likely to be present. PMID:21866214
Ahmad, Muneer; Jung, Low Tan; Bhuiyan, Al-Amin
2017-10-01
Digital signal processing techniques commonly employ fixed length window filters to process the signal contents. DNA signals differ in characteristics from common digital signals since they carry nucleotides as contents. The nucleotides own genetic code context and fuzzy behaviors due to their special structure and order in DNA strand. Employing conventional fixed length window filters for DNA signal processing produce spectral leakage and hence results in signal noise. A biological context aware adaptive window filter is required to process the DNA signals. This paper introduces a biological inspired fuzzy adaptive window median filter (FAWMF) which computes the fuzzy membership strength of nucleotides in each slide of window and filters nucleotides based on median filtering with a combination of s-shaped and z-shaped filters. Since coding regions cause 3-base periodicity by an unbalanced nucleotides' distribution producing a relatively high bias for nucleotides' usage, such fundamental characteristic of nucleotides has been exploited in FAWMF to suppress the signal noise. Along with adaptive response of FAWMF, a strong correlation between median nucleotides and the Π shaped filter was observed which produced enhanced discrimination between coding and non-coding regions contrary to fixed length conventional window filters. The proposed FAWMF attains a significant enhancement in coding regions identification i.e. 40% to 125% as compared to other conventional window filters tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. This study proves that conventional fixed length window filters applied to DNA signals do not achieve significant results since the nucleotides carry genetic code context. The proposed FAWMF algorithm is adaptive and outperforms significantly to process DNA signal contents. The algorithm applied to variety of DNA datasets produced noteworthy discrimination between coding and non-coding regions contrary to fixed window length conventional filters. Copyright © 2017 Elsevier B.V. All rights reserved.
CRITICA: coding region identification tool invoking comparative analysis
NASA Technical Reports Server (NTRS)
Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1999-01-01
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
Functional interrogation of non-coding DNA through CRISPR genome editing
Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.
2017-01-01
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
2012-01-01
Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Kawano, Tomonori
2013-01-01
There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303
Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos
2005-01-01
We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308
Phylogenetic Network for European mtDNA
Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari
2001-01-01
The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229
New t-gap insertion-deletion-like metrics for DNA hybridization thermodynamic modeling.
D'yachkov, Arkadii G; Macula, Anthony J; Pogozelski, Wendy K; Renz, Thomas E; Rykov, Vyacheslav V; Torney, David C
2006-05-01
We discuss the concept of t-gap block isomorphic subsequences and use it to describe new abstract string metrics that are similar to the Levenshtein insertion-deletion metric. Some of the metrics that we define can be used to model a thermodynamic distance function on single-stranded DNA sequences. Our model captures a key aspect of the nearest neighbor thermodynamic model for hybridized DNA duplexes. One version of our metric gives the maximum number of stacked pairs of hydrogen bonded nucleotide base pairs that can be present in any secondary structure in a hybridized DNA duplex without pseudoknots. Thermodynamic distance functions are important components in the construction of DNA codes, and DNA codes are important components in biomolecular computing, nanotechnology, and other biotechnical applications that employ DNA hybridization assays. We show how our new distances can be calculated by using a dynamic programming method, and we derive a Varshamov-Gilbert-like lower bound on the size of some of codes using these distance functions as constraints. We also discuss software implementation of our DNA code design methods.
DOT National Transportation Integrated Search
2006-07-01
This report describes the development of a new coding scheme to classify potentially distracting secondary tasks performed while driving, such as eating and using a cell phone. Compared with prior schemes (Stutts et al., first-generation UMTRI scheme...
Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.
2008-01-01
The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806
Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-01-01
Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205
Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon
2015-01-01
Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190
Functional interrogation of non-coding DNA through CRISPR genome editing.
Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H
2017-05-15
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Transformable Rhodobacter strains, method for producing transformable Rhodobacter strains
Laible, Philip D.; Hanson, Deborah K.
2018-05-08
The invention provides an organism for expressing foreign DNA, the organism engineered to accept standard DNA carriers. The genome of the organism codes for intracytoplasmic membranes and features an interruption in at least one of the genes coding for restriction enzymes. Further provided is a system for producing biological materials comprising: selecting a vehicle to carry DNA which codes for the biological materials; determining sites on the vehicle's DNA sequence susceptible to restriction enzyme cleavage; choosing an organism to accept the vehicle based on that organism not acting upon at least one of said vehicle's sites; engineering said vehicle to contain said DNA; thereby creating a synthetic vector; and causing the synthetic vector to enter the organism so as cause expression of said DNA.
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Zhang, Yuqin; Lin, Fanbo; Zhang, Youyu; Li, Haitao; Zeng, Yue; Tang, Hao; Yao, Shouzhuo
2011-01-01
A new method for the detection of point mutation in DNA based on the monobase-coded cadmium tellurium nanoprobes and the quartz crystal microbalance (QCM) technique was reported. A point mutation (single-base, adenine, thymine, cytosine, and guanine, namely, A, T, C and G, mutation in DNA strand, respectively) DNA QCM sensor was fabricated by immobilizing single-base mutation DNA modified magnetic beads onto the electrode surface with an external magnetic field near the electrode. The DNA-modified magnetic beads were obtained from the biotin-avidin affinity reaction of biotinylated DNA and streptavidin-functionalized core/shell Fe(3)O(4)/Au magnetic nanoparticles, followed by a DNA hybridization reaction. Single-base coded CdTe nanoprobes (A-CdTe, T-CdTe, C-CdTe and G-CdTe, respectively) were used as the detection probes. The mutation site in DNA was distinguished by detecting the decreases of the resonance frequency of the piezoelectric quartz crystal when the coded nanoprobe was added to the test system. This proposed detection strategy for point mutation in DNA is proved to be sensitive, simple, repeatable and low-cost, consequently, it has a great potential for single nucleotide polymorphism (SNP) detection. 2011 © The Japan Society for Analytical Chemistry
A Framework for Identifying and Classifying Undergraduate Student Proof Errors
ERIC Educational Resources Information Center
Strickland, S.; Rand, B.
2016-01-01
This paper describes a framework for identifying, classifying, and coding student proofs, modified from existing proof-grading rubrics. The framework includes 20 common errors, as well as categories for interpreting the severity of the error. The coding scheme is intended for use in a classroom context, for providing effective student feedback. In…
Correlation approach to identify coding regions in DNA sequences
NASA Technical Reports Server (NTRS)
Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1994-01-01
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Improved detection of DNA-binding proteins via compression technology on PSSM information.
Wang, Yubo; Ding, Yijie; Guo, Fei; Wei, Leyi; Tang, Jijun
2017-01-01
Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix-Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix-Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mkaouar-Rebai, Emna, E-mail: emna.mkaouar@gmail.com; Felhi, Rahma; Tabebi, Mouna
Mitochondrial diseases are a heterogeneous group of disorders caused by the impairment of the mitochondrial oxidative phosphorylation system which have been associated with various mutations of the mitochondrial DNA (mtDNA) and nuclear gene mutations. The clinical phenotypes are very diverse and the spectrum is still expanding. As brain and muscle are highly dependent on OXPHOS, consequently, neurological disorders and myopathy are common features of mtDNA mutations. Mutations in mtDNA can be classified into three categories: large-scale rearrangements, point mutations in tRNA or rRNA genes and point mutations in protein coding genes. In the present report, we screened mitochondrial genes ofmore » complex I, III, IV and V in 2 patients with mitochondrial neuromuscular disorders. The results showed the presence the pathogenic heteroplasmic m.9157G>A variation (A211T) in the MT-ATP6 gene in the first patient. We also reported the first case of triplication of 9 bp in the mitochondrial NC7 region in Africa and Tunisia, in association with the novel m.14924T>C in the MT-CYB gene in the second patient with mitochondrial neuromuscular disorder. - Highlights: • We reported 2 patients with mitochondrial neuromuscular disorders. • The heteroplasmic MT-ATP6 9157G>A variation was reported. • A triplication of 9 bp in the mitochondrial NC7 region was detected. • The m.14924T>C transition (S60P) in the MT-CYB gene was found.« less
Recognising promoter sequences using an artificial immune system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cooke, D.E.; Hunt, J.E.
1995-12-31
We have developed an artificial immune system (AIS) which is based on the human immune system. The AIS possesses an adaptive learning mechanism which enables antibodies to emerge which can be used for classification tasks. In this paper, we describe how the AIS has been used to evolve antibodies which can classify promoter containing and promoter negative DNA sequences. The DNA sequences used for teaching were 57 nucleotides in length and contained procaryotic promoters. The system classified previously unseen DNA sequences with an accuracy of approximately 90%.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically
Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel
2015-01-01
Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays
Boerner, Susan; McGinnis, Karen M.
2012-01-01
Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
Quantum biological channel modeling and capacity calculation.
Djordjevic, Ivan B
2012-12-10
Quantum mechanics has an important role in photosynthesis, magnetoreception, and evolution. There were many attempts in an effort to explain the structure of genetic code and transfer of information from DNA to protein by using the concepts of quantum mechanics. The existing biological quantum channel models are not sufficiently general to incorporate all relevant contributions responsible for imperfect protein synthesis. Moreover, the problem of determination of quantum biological channel capacity is still an open problem. To solve these problems, we construct the operator-sum representation of biological channel based on codon basekets (basis vectors), and determine the quantum channel model suitable for study of the quantum biological channel capacity and beyond. The transcription process, DNA point mutations, insertions, deletions, and translation are interpreted as the quantum noise processes. The various types of quantum errors are classified into several broad categories: (i) storage errors that occur in DNA itself as it represents an imperfect storage of genetic information, (ii) replication errors introduced during DNA replication process, (iii) transcription errors introduced during DNA to mRNA transcription, and (iv) translation errors introduced during the translation process. By using this model, we determine the biological quantum channel capacity and compare it against corresponding classical biological channel capacity. We demonstrate that the quantum biological channel capacity is higher than the classical one, for a coherent quantum channel model, suggesting that quantum effects have an important role in biological systems. The proposed model is of crucial importance towards future study of quantum DNA error correction, developing quantum mechanical model of aging, developing the quantum mechanical models for tumors/cancer, and study of intracellular dynamics in general.
Bharatan, V
2008-07-01
The therapeutic effects of the plant species used in homeopathy have never been subjected to systematic analysis. A survey of the various Materiae Medicae shows that over 800 plant species are the source of medicines in homeopathy. As these medicines are considered related to one another with respect to their therapeutic effects for treating similar symptoms, the aim is to classify and map them using the concept of homology. This involves placing the discipline of homeopathy into a comparative framework using these plant medicines as taxa, therapeutic effects as characters, and contemporary cladistic techniques to analyse these relationships. The results are compared using cladograms based on different data sets used in biology (e.g. morphological characters and DNA sequences) to test whether similar cladistic patterns exist among these medicines. By classifying the therapeutic actions, genuine homologies can be distinguished from homoplasies. As this is a comparative study it has been necessary first to update the existing nomenclature of the plant species in the homeopathic literature in line with the current International Code of Botanical Nomenclature.
Genomics dataset of unidentified disclosed isolates.
Rekadwad, Bhagwan N
2016-09-01
Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Comparison of two methods of MMPI-2 profile classification.
Munley, P H; Germain, J M
2000-10-01
The present study investigated the extent of agreement of the highest scale method and the best-fit method in matching MMPI-2 profiles to database code-type profiles and considered profile characteristics that may relate to agreement or disagreement of code-type matches by these two methods. A sample of 519 MMPI-2 profiles that had been classified into database profile code types by these two methods was studied. Resulting code-type matches were classified into three groups: identical (30%), similar (39%), and different (31%), and the profile characteristics of profile elevation, dispersion, and profile code-type definition were studied. Profile code-type definition was significantly different across the three groups with identical and similar match profile groups showing greater profile code-type definition and the different group consisting of profiles that were less well-defined.
Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-04-10
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
Is a Genome a Codeword of an Error-Correcting Code?
Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo
2012-01-01
Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
Russ, Daniel E.; Ho, Kwan-Yuet; Colt, Joanne S.; Armenti, Karla R.; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P.; Karagas, Margaret R.; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T.; Johnson, Calvin A.; Friesen, Melissa C.
2016-01-01
Background Mapping job titles to standardized occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiologic studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. Methods Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14,983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in two occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. Results For 11,991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6- and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (kappa: 0.6–0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. Conclusions Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiologic studies. PMID:27102331
Derenko, Miroslava; Malyarchuk, Boris; Denisova, Galina; Perkova, Maria; Rogalla, Urszula; Grzybowski, Tomasz; Khusnutdinova, Elza; Dambueva, Irina; Zakharov, Ilia
2012-01-01
With the aim of uncovering all of the most basal variation in the northern Asian mitochondrial DNA (mtDNA) haplogroups, we have analyzed mtDNA control region and coding region sequence variation in 98 Altaian Kazakhs from southern Siberia and 149 Barghuts from Inner Mongolia, China. Both populations exhibit the prevalence of eastern Eurasian lineages accounting for 91.9% in Barghuts and 60.2% in Altaian Kazakhs. The strong affinity of Altaian Kazakhs and populations of northern and central Asia has been revealed, reflecting both influences of central Asian inhabitants and essential genetic interaction with the Altai region indigenous populations. Statistical analyses data demonstrate a close positioning of all Mongolic-speaking populations (Mongolians, Buryats, Khamnigans, Kalmyks as well as Barghuts studied here) and Turkic-speaking Sojots, thus suggesting their origin from a common maternal ancestral gene pool. In order to achieve a thorough coverage of DNA lineages revealed in the northern Asian matrilineal gene pool, we have completely sequenced the mtDNA of 55 samples representing haplogroups R11b, B4, B5, F2, M9, M10, M11, M13, N9a and R9c1, which were pinpointed from a massive collection (over 5000 individuals) of northern and eastern Asian, as well as European control region mtDNA sequences. Applying the newly updated mtDNA tree to the previously reported northern Asian and eastern Asian mtDNA data sets has resolved the status of the poorly classified mtDNA types and allowed us to obtain the coalescence age estimates of the nodes of interest using different calibrated rates. Our findings confirm our previous conclusion that northern Asian maternal gene pool consists of predominantly post-LGM components of eastern Asian ancestry, though some genetic lineages may have a pre-LGM/LGM origin. PMID:22363811
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
On fuzzy semantic similarity measure for DNA coding.
Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin
2016-02-01
A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
The agents of natural genome editing.
Witzany, Guenther
2011-06-01
The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.
DNA-based watermarks using the DNA-Crypt algorithm.
Heider, Dominik; Barnekow, Angelika
2007-05-29
The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.
DNA-based watermarks using the DNA-Crypt algorithm
Heider, Dominik; Barnekow, Angelika
2007-01-01
Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434
Hidden Markov models and other machine learning approaches in computational molecular biology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baldi, P.
1995-12-31
This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Computational tools are increasingly needed to process the massive amounts of data, to organize and classify sequences, to detect weak similarities, to separate coding from non-coding regions, and reconstruct the underlying evolutionary history. The fundamental problem in machine learning is the same as in scientific reasoning in general, as well as statistical modeling: to come up with a good model for the data. In thismore » tutorial four classes of models are reviewed. They are: Hidden Markov models; artificial Neural Networks; Belief Networks; and Stochastic Grammars. When dealing with DNA and protein primary sequences, Hidden Markov models are one of the most flexible and powerful alignments and data base searches. In this tutorial, attention is focused on the theory of Hidden Markov Models, and how to apply them to problems in molecular biology.« less
Steel, Olivia; Kraberger, Simona; Sikorski, Alyssa; Young, Laura M; Catchpole, Ryan J; Stevens, Aaron J; Ladley, Jenny J; Coray, Dorien S; Stainton, Daisy; Dayaram, Anisha; Julian, Laurel; van Bysterveldt, Katherine; Varsani, Arvind
2016-09-01
In recent years, innovations in molecular techniques and sequencing technologies have resulted in a rapid expansion in the number of known viral sequences, in particular those with circular replication-associated protein (Rep)-encoding single-stranded (CRESS) DNA genomes. CRESS DNA viruses are present in the virome of many ecosystems and are known to infect a wide range of organisms. A large number of the recently identified CRESS DNA viruses cannot be classified into any known viral families, indicating that the current view of CRESS DNA viral sequence space is greatly underestimated. Animal faecal matter has proven to be a particularly useful source for sampling CRESS DNA viruses in an ecosystem, as it is cost-effective and non-invasive. In this study a viral metagenomic approach was used to explore the diversity of CRESS DNA viruses present in the faeces of domesticated and wild animals in New Zealand. Thirty-eight complete CRESS DNA viral genomes and two circular molecules (that may be defective molecules or single components of multicomponent genomes) were identified from forty-nine individual animal faecal samples. Based on shared genome organisations and sequence similarities, eighteen of the isolates were classified as gemycircularviruses and twelve isolates were classified as smacoviruses. The remaining eight isolates lack significant sequence similarity with any members of known CRESS DNA virus groups. This research adds significantly to our knowledge of CRESS DNA viral diversity in New Zealand, emphasising the prevalence of CRESS DNA viruses in nature, and reinforcing the suggestion that a large proportion of CRESS DNA viruses are yet to be identified. Copyright © 2016 Elsevier B.V. All rights reserved.
Automatic Residential/Commercial Classification of Parcels with Solar Panel Detections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morton, April M; Omitaomu, Olufemi A; Kotikot, Susan
A computational method to automatically detect solar panels on rooftops to aid policy and financial assessment of solar distributed generation. The code automatically classifies parcels containing solar panels in the U.S. as residential or commercial. The code allows the user to specify an input dataset containing parcels and detected solar panels, and then uses information about the parcels and solar panels to automatically classify the rooftops as residential or commercial using machine learning techniques. The zip file containing the code includes sample input and output datasets for the Boston and DC areas.
Applications of statistical physics and information theory to the analysis of DNA sequences
NASA Astrophysics Data System (ADS)
Grosse, Ivo
2000-10-01
DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Kress, W John; Erickson, David L
2007-06-06
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Entropic Profiler – detection of conservation in genomes using information theory
Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana
2009-01-01
Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538
Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z
2017-10-05
Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.
Genomics dataset on unclassified published organism (patent US 7547531).
Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier
2016-12-01
Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
Chao, Tianle; Wang, Guizhi; Wang, Jianmin; Liu, Zhaohua; Ji, Zhibin; Hou, Lei; Zhang, Chunlan
2016-01-01
High-throughput mRNA sequencing enables the discovery of new transcripts and additional parts of incompletely annotated transcripts. Compared with the human and cow genomes, the reference annotation level of the sheep genome is still low. An investigation of new transcripts in sheep skeletal muscle will improve our understanding of muscle development. Therefore, applying high-throughput sequencing, two cDNA libraries from the biceps brachii of small-tailed Han sheep and Dorper sheep were constructed, and whole-transcriptome analysis was performed to determine the unknown transcript catalogue of this tissue. In this study, 40,129 transcripts were finally mapped to the sheep genome. Among them, 3,467 transcripts were determined to be unannotated in the current reference sheep genome and were defined as new transcripts. Based on protein-coding capacity prediction and comparative analysis of sequence similarity, 246 transcripts were classified as portions of unannotated genes or incompletely annotated genes. Another 1,520 transcripts were predicted with high confidence to be long non-coding RNAs. Our analysis also revealed 334 new transcripts that displayed specific expression in ruminants and uncovered a number of new transcripts without intergenus homology but with specific expression in sheep skeletal muscle. The results confirmed a complex transcript pattern of coding and non-coding RNA in sheep skeletal muscle. This study provided important information concerning the sheep genome and transcriptome annotation, which could provide a basis for further study.
Facile and High-Throughput Synthesis of Functional Microparticles with Quick Response Codes.
Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun
2016-06-01
Encoded microparticles are high demand in multiplexed assays and labeling. However, the current methods for the synthesis and coding of microparticles either lack robustness and reliability, or possess limited coding capacity. Here, a massive coding of dissociated elements (MiCODE) technology based on innovation of a chemically reactive off-stoichimetry thiol-allyl photocurable polymer and standard lithography to produce a large number of quick response (QR) code microparticles is introduced. The coding process is performed by photobleaching the QR code patterns on microparticles when fluorophores are incorporated into the prepolymer formulation. The fabricated encoded microparticles can be released from a substrate without changing their features. Excess thiol functionality on the microparticle surface allows for grafting of amine groups and further DNA probes. A multiplexed assay is demonstrated using the DNA-grafted QR code microparticles. The MiCODE technology is further characterized by showing the incorporation of BODIPY-maleimide (BDP-M) and Nile Red fluorophores for coding and the use of microcontact printing for immobilizing DNA probes on microparticle surfaces. This versatile technology leverages mature lithography facilities for fabrication and thus is amenable to scale-up in the future, with potential applications in bioassays and in labeling consumer products. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kress, W. John; Erickson, David L.
2007-01-01
Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Russ, Daniel E; Ho, Kwan-Yuet; Colt, Joanne S; Armenti, Karla R; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P; Karagas, Margaret R; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T; Johnson, Calvin A; Friesen, Melissa C
2016-06-01
Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14 983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. For 11 991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Brain cDNA clone for human cholinesterase
DOE Office of Scientific and Technical Information (OSTI.GOV)
McTiernan, C.; Adkins, S.; Chatonnet, A.
1987-10-01
A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
GUI to Facilitate Research on Biological Damage from Radiation
NASA Technical Reports Server (NTRS)
Cucinotta, Frances A.; Ponomarev, Artem Lvovich
2010-01-01
A graphical-user-interface (GUI) computer program has been developed to facilitate research on the damage caused by highly energetic particles and photons impinging on living organisms. The program brings together, into one computational workspace, computer codes that have been developed over the years, plus codes that will be developed during the foreseeable future, to address diverse aspects of radiation damage. These include codes that implement radiation-track models, codes for biophysical models of breakage of deoxyribonucleic acid (DNA) by radiation, pattern-recognition programs for extracting quantitative information from biological assays, and image-processing programs that aid visualization of DNA breaks. The radiation-track models are based on transport models of interactions of radiation with matter and solution of the Boltzmann transport equation by use of both theoretical and numerical models. The biophysical models of breakage of DNA by radiation include biopolymer coarse-grained and atomistic models of DNA, stochastic- process models of deposition of energy, and Markov-based probabilistic models of placement of double-strand breaks in DNA. The program is designed for use in the NT, 95, 98, 2000, ME, and XP variants of the Windows operating system.
DNA: Polymer and molecular code
NASA Astrophysics Data System (ADS)
Shivashankar, G. V.
1999-10-01
The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes gene expression a prime example of a biological code. We developed a novel method of making DNA micro- arrays, the so-called DNA chip. Using the optical tweezer concept, we were able to pattern biomolecules on a solid substrate, developing a new type of sub-micron laser lithography. A laser beam is focused onto a thin gold film on a glass substrate. Laser ablation of gold results in local aggregation of nanometer scale beads conjugated with small DNA oligonucleotides, with sub-micron resolution. This leads to specific detection of cDNA and RNA molecules. We built a simple micro-array fabrication and detection in the laboratory, based on this method, to probe addressable pools (genes, proteins or antibodies). We have lately used molecular beacons (single stranded DNA with a stem-loop structure containing a fluorophore and quencher), for the direct detection of unlabelled mRNA. As a first step towards a study of the dynamics of the biological code, we have begun to examine the patterns of gene expression during virus (T7 phage) infection of E-coli bacteria.
A novel method for in silico identification of regulatory SNPs in human genome.
Li, Rong; Zhong, Dexing; Liu, Ruiling; Lv, Hongqiang; Zhang, Xinman; Liu, Jun; Han, Jiuqiang
2017-02-21
Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Henrich, Oliver; Gutiérrez Fosado, Yair Augusto; Curk, Tine; Ouldridge, Thomas E
2018-05-10
During the last decade coarse-grained nucleotide models have emerged that allow us to study DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.
Zhu, Debin; Tang, Yabing; Xing, Da; Chen, Wei R
2008-05-15
A bio bar code assay based on oligonucleotide-modified gold nanoparticles (Au-NPs) provides a PCR-free method for quantitative detection of nucleic acid targets. However, the current bio bar code assay requires lengthy experimental procedures including the preparation and release of bar code DNA probes from the target-nanoparticle complex and immobilization and hybridization of the probes for quantification. Herein, we report a novel PCR-free electrochemiluminescence (ECL)-based bio bar code assay for the quantitative detection of genetically modified organism (GMO) from raw materials. It consists of tris-(2,2'-bipyridyl) ruthenium (TBR)-labeled bar code DNA, nucleic acid hybridization using Au-NPs and biotin-labeled probes, and selective capture of the hybridization complex by streptavidin-coated paramagnetic beads. The detection of target DNA is realized by direct measurement of ECL emission of TBR. It can quantitatively detect target nucleic acids with high speed and sensitivity. This method can be used to quantitatively detect GMO fragments from real GMO products.
Low-energy electron dose-point kernel simulations using new physics models implemented in Geant4-DNA
NASA Astrophysics Data System (ADS)
Bordes, Julien; Incerti, Sébastien; Lampe, Nathanael; Bardiès, Manuel; Bordage, Marie-Claude
2017-05-01
When low-energy electrons, such as Auger electrons, interact with liquid water, they induce highly localized ionizing energy depositions over ranges comparable to cell diameters. Monte Carlo track structure (MCTS) codes are suitable tools for performing dosimetry at this level. One of the main MCTS codes, Geant4-DNA, is equipped with only two sets of cross section models for low-energy electron interactions in liquid water (;option 2; and its improved version, ;option 4;). To provide Geant4-DNA users with new alternative physics models, a set of cross sections, extracted from CPA100 MCTS code, have been added to Geant4-DNA. This new version is hereafter referred to as ;Geant4-DNA-CPA100;. In this study, ;Geant4-DNA-CPA100; was used to calculate low-energy electron dose-point kernels (DPKs) between 1 keV and 200 keV. Such kernels represent the radial energy deposited by an isotropic point source, a parameter that is useful for dosimetry calculations in nuclear medicine. In order to assess the influence of different physics models on DPK calculations, DPKs were calculated using the existing Geant4-DNA models (;option 2; and ;option 4;), newly integrated CPA100 models, and the PENELOPE Monte Carlo code used in step-by-step mode for monoenergetic electrons. Additionally, a comparison was performed of two sets of DPKs that were simulated with ;Geant4-DNA-CPA100; - the first set using Geant4‧s default settings, and the second using CPA100‧s original code default settings. A maximum difference of 9.4% was found between the Geant4-DNA-CPA100 and PENELOPE DPKs. Between the two Geant4-DNA existing models, slight differences, between 1 keV and 10 keV were observed. It was highlighted that the DPKs simulated with the two Geant4-DNA's existing models were always broader than those generated with ;Geant4-DNA-CPA100;. The discrepancies observed between the DPKs generated using Geant4-DNA's existing models and ;Geant4-DNA-CPA100; were caused solely by their different cross sections. The different scoring and interpolation methods used in CPA100 and Geant4 to calculate DPKs showed differences close to 3.0% near the source.
Dynamics of the DNA repair proteins WRN and BLM in the nucleoplasm and nucleoli.
Bendtsen, Kristian Moss; Jensen, Martin Borch; May, Alfred; Rasmussen, Lene Juel; Trusina, Ala; Bohr, Vilhelm A; Jensen, Mogens H
2014-11-01
We have investigated the mobility of two EGFP-tagged DNA repair proteins, WRN and BLM. In particular, we focused on the dynamics in two locations, the nucleoli and the nucleoplasm. We found that both WRN and BLM use a "DNA-scanning" mechanism, with rapid binding-unbinding to DNA resulting in effective diffusion. In the nucleoplasm WRN and BLM have effective diffusion coefficients of 1.62 and 1.34 μm(2)/s, respectively. Likewise, the dynamics in the nucleoli are also best described by effective diffusion, but with diffusion coefficients a factor of ten lower than in the nucleoplasm. From this large reduction in diffusion coefficient we were able to classify WRN and BLM as DNA damage scanners. In addition to WRN and BLM we also classified other DNA damage proteins and found they all fall into one of two categories. Either they are scanners, similar to WRN and BLM, with very low diffusion coefficients, suggesting a scanning mechanism, or they are almost freely diffusing, suggesting that they interact with DNA only after initiation of a DNA damage response.
Mata-Cases, Manel; Mauricio, Dídac; Real, Jordi; Bolíbar, Bonaventura; Franch-Nadal, Josep
2016-11-01
To assess the prevalence of miscoding, misclassification, misdiagnosis and under-registration of diabetes mellitus (DM) in primary health care in Catalonia (Spain), and to explore use of automated algorithms to identify them. In this cross-sectional, retrospective study using an anonymized electronic general practice database, data were collected from patients or users with a diabetes-related code or from patients with no DM or prediabetes code but treated with antidiabetic drugs (unregistered DM). Decision algorithms were designed to classify the true diagnosis of type 1 DM (T1DM), type 2 DM (T2DM), and undetermined DM (UDM), and to classify unregistered DM patients treated with antidiabetic drugs. Data were collected from a total of 376,278 subjects with a DM ICD-10 code, and from 8707 patients with no DM or prediabetes code but treated with antidiabetic drugs. After application of the algorithms, 13.9% of patients with T1DM were identified as misclassified, and were probably T2DM; 80.9% of patients with UDM were reclassified as T2DM, and 19.1% of them were misdiagnosed as DM when they probably had prediabetes. The overall prevalence of miscoding (multiple codes or UDM) was 2.2%. Finally, 55.2% of subjects with unregistered DM were classified as prediabetes, 35.7% as T2DM, 8.5% as UDM treated with insulin, and 0.6% as T1DM. The prevalence of inappropriate codification or classification and under-registration of DM is relevant in primary care. Implementation of algorithms could automatically flag cases that need review and would substantially decrease the risk of inappropriate registration or coding. Copyright © 2016 SEEN. Publicado por Elsevier España, S.L.U. All rights reserved.
A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.
Marucci-Wellman, H; Lehto, M; Corns, H
2011-12-01
Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review. Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding. When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories. A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File
Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E
2016-01-04
Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Novel variants of the 5S rRNA genes in Eruca sativa.
Singh, K; Bhatia, S; Lakshmikumaran, M
1994-02-01
The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Process of labeling specific chromosomes using recombinant repetitive DNA
Moyzis, R.K.; Meyne, J.
1988-02-12
Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.
Lata, Charu; Bhutty, Sarita; Bahadur, Ranjit Prasad; Majee, Manoj; Prasad, Manoj
2011-06-01
The DREB genes code for important plant transcription factors involved in the abiotic stress response and signal transduction. Characterization of DREB genes and development of functional markers for effective alleles is important for marker-assisted selection in foxtail millet. Here the characterization of a cDNA (SiDREB2) encoding a putative dehydration-responsive element-binding protein 2 from foxtail millet and the development of an allele-specific marker (ASM) for dehydration tolerance is reported. A cDNA clone (GenBank accession no. GT090998) coding for a putative DREB2 protein was isolated as a differentially expressed gene from a 6 h dehydration stress SSH library. A 5' RACE (rapid amplification of cDNA ends) was carried out to obtain the full-length cDNA, and sequence analysis showed that SiDREB2 encoded a polypeptide of 234 amino acids with a predicted mol. wt of 25.72 kDa and a theoretical pI of 5.14. A theoretical model of the tertiary structure shows that it has a highly conserved GCC-box-binding N-terminal domain, and an acidic C-terminus that acts as an activation domain for transcription. Based on its similarity to AP2 domains, SiDREB2 was classified into the A-2 subgroup of the DREB subfamily. Quantitative real-time PCR analysis showed significant up-regulation of SiDREB2 by dehydration (polyethylene glycol) and salinity (NaCl), while its expression was less affected by other stresses. A synonymous single nucleotide polymorphism (SNP) associated with dehydration tolerance was detected at the 558th base pair (an A/G transition) in the SiDREB2 gene in a core set of 45 foxtail millet accessions used. Based on the identified SNP, three primers were designed to develop an ASM for dehydration tolerance. The ASM produced a 261 bp fragment in all the tolerant accessions and produced no amplification in the sensitive accessions. The use of this ASM might be faster, cheaper, and more reproducible than other SNP genotyping methods, and thus will enable marker-aided breeding of foxtail millet for dehydration tolerance.
ELF-MF exposure affects the robustness of epigenetic programming during granulopoiesis
NASA Astrophysics Data System (ADS)
Manser, Melissa; Sater, Mohamad R. Abdul; Schmid, Christoph D.; Noreen, Faiza; Murbach, Manuel; Kuster, Niels; Schuermann, David; Schär, Primo
2017-03-01
Extremely-low-frequency magnetic fields (ELF-MF) have been classified as “possibly carcinogenic” to humans on the grounds of an epidemiological association of ELF-MF exposure with an increased risk of childhood leukaemia. Yet, underlying mechanisms have remained obscure. Genome instability seems an unlikely reason as the energy transmitted by ELF-MF is too low to damage DNA and induce cancer-promoting mutations. ELF-MF, however, may perturb the epigenetic code of genomes, which is well-known to be sensitive to environmental conditions and generally deranged in cancers, including leukaemia. We examined the potential of ELF-MF to influence key epigenetic modifications in leukaemic Jurkat cells and in human CD34+ haematopoietic stem cells undergoing in vitro differentiation into the neutrophilic lineage. During granulopoiesis, sensitive genome-wide profiling of multiple replicate experiments did not reveal any statistically significant, ELF-MF-dependent alterations in the patterns of active (H3K4me2) and repressive (H3K27me3) histone marks nor in DNA methylation. However, ELF-MF exposure showed consistent effects on the reproducibility of these histone and DNA modification profiles (replicate variability), which appear to be of a stochastic nature but show preferences for the genomic context. The data indicate that ELF-MF exposure stabilizes active chromatin, particularly during the transition from a repressive to an active state during cell differentiation.
The virophage as a unique parasite of the giant mimivirus.
La Scola, Bernard; Desnues, Christelle; Pagnier, Isabelle; Robert, Catherine; Barrassi, Lina; Fournous, Ghislain; Merchat, Michèle; Suzan-Monti, Marie; Forterre, Patrick; Koonin, Eugene; Raoult, Didier
2008-09-04
Viruses are obligate parasites of Eukarya, Archaea and Bacteria. Acanthamoeba polyphaga mimivirus (APMV) is the largest known virus; it grows only in amoeba and is visible under the optical microscope. Mimivirus possesses a 1,185-kilobase double-stranded linear chromosome whose coding capacity is greater than that of numerous bacteria and archaea1, 2, 3. Here we describe an icosahedral small virus, Sputnik, 50 nm in size, found associated with a new strain of APMV. Sputnik cannot multiply in Acanthamoeba castellanii but grows rapidly, after an eclipse phase, in the giant virus factory found in amoebae co-infected with APMV4. Sputnik growth is deleterious to APMV and results in the production of abortive forms and abnormal capsid assembly of the host virus. The Sputnik genome is an 18.343-kilobase circular double-stranded DNA and contains genes that are linked to viruses infecting each of the three domains of life Eukarya, Archaea and Bacteria. Of the 21 predicted protein-coding genes, eight encode proteins with detectable homologues, including three proteins apparently derived from APMV, a homologue of an archaeal virus integrase, a predicted primase-helicase, a packaging ATPase with homologues in bacteriophages and eukaryotic viruses, a distant homologue of bacterial insertion sequence transposase DNA-binding subunit, and a Zn-ribbon protein. The closest homologues of the last four of these proteins were detected in the Global Ocean Survey environmental data set5, suggesting that Sputnik represents a currently unknown family of viruses. Considering its functional analogy with bacteriophages, we classify this virus as a virophage. The virophage could be a vehicle mediating lateral gene transfer between giant viruses.
A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.
Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco
2011-01-01
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.
Hiding message into DNA sequence through DNA coding and chaotic maps.
Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman
2014-09-01
The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
Liberek, K; Osipiuk, J; Zylicz, M; Ang, D; Skorko, J; Georgopoulos, C
1990-02-25
The process of initiation of lambda DNA replication requires the assembly of the proper nucleoprotein complex at the origin of replication, ori lambda. The complex is composed of both phage and host-coded proteins. The lambda O initiator protein binds specifically to ori lambda. The lambda P initiator protein binds to both lambda O and the host-coded dnaB helicase, giving rise to an ori lambda DNA.lambda O.lambda P.dnaB structure. The dnaK and dnaJ heat shock proteins have been shown capable of dissociating this complex. The thus freed dnaB helicase unwinds the duplex DNA template at the replication fork. In this report, through cross-linking, size chromatography, and protein affinity chromatography, we document some of the protein-protein interactions occurring at ori lambda. Our results show that the dnaK protein specifically interacts with both lambda O and lambda P, and that the dnaJ protein specifically interacts with the dnaB helicase.
Scaling features of noncoding DNA
NASA Technical Reports Server (NTRS)
Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.
1999-01-01
We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Privacy rules for DNA databanks. Protecting coded 'future diaries'.
Annas, G J
1993-11-17
In privacy terms, genetic information is like medical information. But the information contained in the DNA molecule itself is more sensitive because it contains an individual's probabilistic "future diary," is written in a code that has only partially been broken, and contains information about an individual's parents, siblings, and children. Current rules for protecting the privacy of medical information cannot protect either genetic information or identifiable DNA samples stored in DNA databanks. A review of the legal and public policy rationales for protecting genetic privacy suggests that specific enforceable privacy rules for DNA databanks are needed. Four preliminary rules are proposed to govern the creation of DNA databanks, the collection of DNA samples for storage, limits on the use of information derived from the samples, and continuing obligations to those whose DNA samples are in the databanks.
ERIC Educational Resources Information Center
Baker, Opal Ruth
Research on Spanish/English code switching is reviewed and the definitions and categories set up by the investigators are examined. Their methods of locating, limiting, and classifying true code switches, and the terms used and results obtained, are compared. It is found that in these studies, conversational (intra-discourse) code switching is…
Exploring the loblolly pine (Pinus taeda L.) genome by BAC sequencing and Cot analysis.
Perera, Dinum; Magbanua, Zenaida V; Thummasuwan, Supaphan; Mukherjee, Dipaloke; Arick, Mark; Chouvarine, Philippe; Nairn, Campbell J; Schmutz, Jeremy; Grimwood, Jane; Dean, Jeffrey F D; Peterson, Daniel G
2018-07-15
Loblolly pine (LP; Pinus taeda L.) is an economically and ecologically important tree in the southeastern U.S. To advance understanding of the loblolly pine (LP; Pinus taeda L.) genome, we sequenced and analyzed 100 BAC clones and performed a Cot analysis. The Cot analysis indicates that the genome is composed of 57, 24, and 10% highly-repetitive, moderately-repetitive, and single/low-copy sequences, respectively (the remaining 9% of the genome is a combination of fold back and damaged DNA). Although single/low-copy DNA only accounts for 10% of the LP genome, the amount of single/low-copy DNA in LP is still 14 times the size of the Arabidopsis genome. Since gene numbers in LP are similar to those in Arabidopsis, much of the single/low-copy DNA of LP would appear to be composed of DNA that is both gene- and repeat-poor. Macroarrays prepared from a LP bacterial artificial chromosome (BAC) library were hybridized with probes designed from cell wall synthesis/wood development cDNAs, and 50 of the "targeted" clones were selected for further analysis. An additional 25 clones were selected because they contained few repeats, while 25 more clones were selected at random. The 100 BAC clones were Sanger sequenced and assembled. Of the targeted BACs, 80% contained all or part of the cDNA used to target them. One targeted BAC was found to contain fungal DNA and was eliminated from further analysis. Combinations of similarity-based and ab initio gene prediction approaches were utilized to identify and characterize potential coding regions in the 99 BACs containing LP DNA. From this analysis, we identified 154 gene models (GMs) representing both putative protein-coding genes and likely pseudogenes. Ten of the GMs (all of which were specifically targeted) had enough support to be classified as intact genes. Interestingly, the 154 GMs had statistically indistinguishable (α = 0.05) distributions in the targeted and random BAC clones (15.18 and 12.61 GM/Mb, respectively), whereas the low-repeat BACs contained significantly fewer GMs (7.08 GM/Mb). However, when GM length was considered, the targeted BACs had a significantly greater percentage of their length in GMs (3.26%) when compared to random (1.63%) and low-repeat (0.62%) BACs. The results of our study provide insight into LP evolution and inform ongoing efforts to produce a reference genome sequence for LP, while characterization of genes involved in cell wall production highlights carbon metabolism pathways that can be leveraged for increasing wood production. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
CHARACTERIZATION OF EXTRAINTESTINAL PATHOGENIC ESCHERICHIA COLI FROM MEAT IN SOUTHERN THAILAND.
Sukkua, Kannika; Pomwised, Rattanaruji; Rattanachuay, Pattamarat; Khianngam, Saowapar; Sukhumungoon, Pharanai
2017-01-01
Extraintestinal pathogenic Escherichia coli (ExPEC) is an E. coli group, which causes diseases in systems outside human intestinal tract. ExPEC isolates were recovered from fresh chicken (25%) and pork (10%) meats, but not beef and shrimp, from markets in southern Thailand. Among the 14 ExPEC strains isolated, all carried iutA and fimH, coding for aerobactin and type 1 fimbriae, respectively. Two ExPEC strains from chicken meat possessed kpsMTK1 coding for K1 capsular antigen, responsible for neonatal meningitis. Antimicrobial susceptibility assay revealed that all ExPEC were resistant to streptomycin and carried blaTEM, but susceptible to imipenem. Phylogenetic group analysis showed that 4, 4, and 6 ExPEC strains belonged to group A, B1 and D, respectively. ExPEC strains were classified into four serotypes, namely, O8 (2 strains), O15 (2 strains), O25 (1 strain), and O127a (1 strain), with the remaining untypeable. DNA profiling analysis by BOX-PCR revealed clonality of strains with the same serotype. The existence of ExPEC in meat products should cause concern regarding food safety and public health not only in southern Thailand but also throughout the country.
Palmer, Cameron S; Franklyn, Melanie
2011-01-07
Trauma systems should consistently monitor a given trauma population over a period of time. The Abbreviated Injury Scale (AIS) and derived scores such as the Injury Severity Score (ISS) are commonly used to quantify injury severities in trauma registries. To reflect contemporary trauma management and treatment, the most recent version of the AIS (AIS08) contains many codes which differ in severity from their equivalents in the earlier 1998 version (AIS98). Consequently, the adoption of AIS08 may impede comparisons between data coded using different AIS versions. It may also affect the number of patients classified as major trauma. The entire AIS98-coded injury dataset of a large population based trauma registry was retrieved and mapped to AIS08 using the currently available AIS98-AIS08 dictionary map. The percentage of codes which had increased or decreased in severity, or could not be mapped, was examined in conjunction with the effect of these changes to the calculated ISS. The potential for free text information accompanying AIS coding to improve the quality of AIS mapping was explored. A total of 128280 AIS98-coded injuries were evaluated in 32134 patients, 15471 patients of whom were classified as major trauma. Although only 4.5% of dictionary codes decreased in severity from AIS98 to AIS08, this represented almost 13% of injuries in the registry. In 4.9% of patients, no injuries could be mapped. ISS was potentially unreliable in one-third of patients, as they had at least one AIS98 code which could not be mapped. Using AIS08, the number of patients classified as major trauma decreased by between 17.3% and 30.3%. Evaluation of free text descriptions for some injuries demonstrated the potential to improve mapping between AIS versions. Converting AIS98-coded data to AIS08 results in a significant decrease in the number of patients classified as major trauma. Many AIS98 codes are missing from the existing AIS map, and across a trauma population the AIS08 dataset estimates which it produces are of insufficient quality to be used in practice. However, it may be possible to improve AIS98 to AIS08 mapping to the point where it is useful to established registries.
2011-01-01
Background Trauma systems should consistently monitor a given trauma population over a period of time. The Abbreviated Injury Scale (AIS) and derived scores such as the Injury Severity Score (ISS) are commonly used to quantify injury severities in trauma registries. To reflect contemporary trauma management and treatment, the most recent version of the AIS (AIS08) contains many codes which differ in severity from their equivalents in the earlier 1998 version (AIS98). Consequently, the adoption of AIS08 may impede comparisons between data coded using different AIS versions. It may also affect the number of patients classified as major trauma. Methods The entire AIS98-coded injury dataset of a large population based trauma registry was retrieved and mapped to AIS08 using the currently available AIS98-AIS08 dictionary map. The percentage of codes which had increased or decreased in severity, or could not be mapped, was examined in conjunction with the effect of these changes to the calculated ISS. The potential for free text information accompanying AIS coding to improve the quality of AIS mapping was explored. Results A total of 128280 AIS98-coded injuries were evaluated in 32134 patients, 15471 patients of whom were classified as major trauma. Although only 4.5% of dictionary codes decreased in severity from AIS98 to AIS08, this represented almost 13% of injuries in the registry. In 4.9% of patients, no injuries could be mapped. ISS was potentially unreliable in one-third of patients, as they had at least one AIS98 code which could not be mapped. Using AIS08, the number of patients classified as major trauma decreased by between 17.3% and 30.3%. Evaluation of free text descriptions for some injuries demonstrated the potential to improve mapping between AIS versions. Conclusions Converting AIS98-coded data to AIS08 results in a significant decrease in the number of patients classified as major trauma. Many AIS98 codes are missing from the existing AIS map, and across a trauma population the AIS08 dataset estimates which it produces are of insufficient quality to be used in practice. However, it may be possible to improve AIS98 to AIS08 mapping to the point where it is useful to established registries. PMID:21214906
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics
NASA Technical Reports Server (NTRS)
Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
Lurie, Jon D.; Tosteson, Anna N.A.; Deyo, Richard A.; Tosteson, Tor; Weinstein, James; Mirza, Sohail K.
2014-01-01
Study Design Retrospective analysis of Medicare claims linked to a multi-center clinical trial. Objective The Spine Patient Outcomes Research Trial (SPORT) provided a unique opportunity to examine the validity of a claims-based algorithm for grouping patients by surgical indication. SPORT enrolled patients for lumbar disc herniation, spinal stenosis, and degenerative spondylolisthesis. We compared the surgical indication derived from Medicare claims to that provided by SPORT surgeons, the “gold standard”. Summary of Background Data Administrative data are frequently used to report procedure rates, surgical safety outcomes, and costs in the management of spinal surgery. However, the accuracy of using diagnosis codes to classify patients by surgical indication has not been examined. Methods Medicare claims were link to beneficiaries enrolled in SPORT. The sensitivity and specificity of three claims-based approaches to group patients based on surgical indications were examined: 1) using the first listed diagnosis; 2) using all diagnoses independently; and 3) using a diagnosis hierarchy based on the support for fusion surgery. Results Medicare claims were obtained from 376 SPORT participants, including 21 with disc herniation, 183 with spinal stenosis, and 172 with degenerative spondylolisthesis. The hierarchical coding algorithm was the most accurate approach for classifying patients by surgical indication, with sensitivities of 76.2%, 88.1%, and 84.3% for disc herniation, spinal stenosis, and degenerative spondylolisthesis cohorts, respectively. The specificity was 98.3% for disc herniation, 83.2% for spinal stenosis, and 90.7% for degenerative spondylolisthesis. Misclassifications were primarily due to codes attributing more complex pathology to the case. Conclusion Standardized approaches for using claims data to accurately group patients by surgical indications has widespread interest. We found that a hierarchical coding approach correctly classified over 90% of spine patients into their respective SPORT cohorts. Therefore, claims data appears to be a reasonably valid approach to classifying patients by surgical indication. PMID:24525995
Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso
2015-07-01
In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Vargas, E. L.; Rivas, D. A.; Duot, A. C.; Hovey, R. T.; Andrianarijaona, V. M.
2015-03-01
DNA replication is the basis for all biological reproduction. A strand of DNA will ``unzip'' and bind with a complimentary strand, creating two identical strands. In this study, we are considering how this process is affected by Interatomic Coulombic Decay (ICD), specifically how ICD affects the individual coding proteins' ability to hold together. ICD mainly deals with how the electron returns to its original state after excitation and how this affects its immediate atomic environment, sometimes affecting the connectivity between interaction sites on proteins involved in the DNA coding process. Biological heredity is fundamentally controlled by DNA and its replication therefore it affects every living thing. The small nature of the proteins (within the range of nanometers) makes it a good candidate for research of this scale. Understanding how ICD affects DNA molecules can give us invaluable insight into the human genetic code and the processes behind cell mutations that can lead to cancer. Authors wish to give special thanks to Pacific Union College Student Senate in Angwin, California, for their financial support.
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China
Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang
2013-01-01
Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Transcription and DNA Damage: Holding Hands or Crossing Swords?
D'Alessandro, Giuseppina; d'Adda di Fagagna, Fabrizio
2017-10-27
Transcription has classically been considered a potential threat to genome integrity. Collision between transcription and DNA replication machinery, and retention of DNA:RNA hybrids, may result in genome instability. On the other hand, it has been proposed that active genes repair faster and preferentially via homologous recombination. Moreover, while canonical transcription is inhibited in the proximity of DNA double-strand breaks, a growing body of evidence supports active non-canonical transcription at DNA damage sites. Small non-coding RNAs accumulate at DNA double-strand break sites in mammals and other organisms, and are involved in DNA damage signaling and repair. Furthermore, RNA binding proteins are recruited to DNA damage sites and participate in the DNA damage response. Here, we discuss the impact of transcription on genome stability, the role of RNA binding proteins at DNA damage sites, and the function of small non-coding RNAs generated upon damage in the signaling and repair of DNA lesions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Diversity of Bacteria at Healthy Human Conjunctiva
Dong, Qunfeng; Brulc, Jennifer M.; Iovieno, Alfonso; Bates, Brandon; Garoutte, Aaron; Miller, Darlene; Revanna, Kashi V.; Gao, Xiang; Antonopoulos, Dionysios A.; Slepak, Vladlen Z.
2011-01-01
Purpose. Ocular surface (OS) microbiota contributes to infectious and autoimmune diseases of the eye. Comprehensive analysis of microbial diversity at the OS has been impossible because of the limitations of conventional cultivation techniques. This pilot study aimed to explore true diversity of human OS microbiota using DNA sequencing-based detection and identification of bacteria. Methods. Composition of the bacterial community was characterized using deep sequencing of the 16S rRNA gene amplicon libraries generated from total conjunctival swab DNA. The DNA sequences were classified and the diversity parameters measured using bioinformatics software ESPRIT and MOTHUR and tools available through the Ribosomal Database Project-II (RDP-II). Results. Deep sequencing of conjunctival rDNA from four subjects yielded a total of 115,003 quality DNA reads, corresponding to 221 species-level phylotypes per subject. The combined bacterial community classified into 5 phyla and 59 distinct genera. However, 31% of all DNA reads belonged to unclassified or novel bacteria. The intersubject variability of individual OS microbiomes was very significant. Regardless, 12 genera—Pseudomonas, Propionibacterium, Bradyrhizobium, Corynebacterium, Acinetobacter, Brevundimonas, Staphylococci, Aquabacterium, Sphingomonas, Streptococcus, Streptophyta, and Methylobacterium—were ubiquitous among the analyzed cohort and represented the putative “core” of conjunctival microbiota. The other 47 genera accounted for <4% of the classified portion of this microbiome. Unexpectedly, healthy conjunctiva contained many genera that are commonly identified as ocular surface pathogens. Conclusions. The first DNA sequencing-based survey of bacterial population at the conjunctiva have revealed an unexpectedly diverse microbial community. All analyzed samples contained ubiquitous (core) genera that included commensal, environmental, and opportunistic pathogenic bacteria. PMID:21571682
2016-06-01
managed by teams organized by the four- digit Federal Supply Classification (FSC) code, which classifies a part by type of materiel. When the consumable...Command [NAVSUP], 2015a). The first four digits of the NSN comprise the FSC code, which categorizes the item being ordered; in the present example it...Table 3, requisitions are divided into three priority bins—high (TP 1), medium (TP 2), 15 and low (TP 3). A mission-critical requirement almost
Transcriptional mapping of the ribosomal RNA region of mouse L-cell mitochondrial DNA.
Nagley, P; Clayton, D A
1980-01-01
The map positions in mouse mitochondrial DNA of the two ribosomal RNA genes and adjacent genes coding several small transcripts have been determined precisely by application of a procedure in which DNA-RNA hybrids have been subjected to digestion by S1 nuclease under conditions of varying severity. Digestion of the DNA-RNA hybrids with S1 nuclease yielded a series of species which were shown to contain ribosomal RNA molecules together with adjacent transcripts hybridized conjointly to a continuous segment of mitochondrial DNA. There is one small transcript about 60 bases long whose gene adjoins the sequences coding the 5'-end of the small ribosomal RNA (950 bases) and which lies approximately 200 nucleotides from the D-loop origin of heavy strand mitochondrial DNA synthesis. An 80-base transcript lies between the small and large ribosomal RNA genes, and genes for two further short transcript (each about 80 bases in length) abut the sequences coding the 3'-end of the large ribosomal RNA (approximately 1500 bases). The ability to isolate a discrete DNA-RNA hybrid species approximately 2700 base pairs in length containing all these transcripts suggests that there can be few nucleotides in this region of mouse mitochondrial DNA which are not represented as stable RNA species. Images PMID:6253898
Statistical and linguistic features of DNA sequences
NASA Technical Reports Server (NTRS)
Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Strand, Janne M; Scheffler, Katja; Bjørås, Magnar; Eide, Lars
2014-06-01
The cellular genomes are continuously damaged by reactive oxygen species (ROS) from aerobic processes. The impact of DNA damage depends on the specific site as well as the cellular state. The steady-state level of DNA damage is the net result of continuous formation and subsequent repair, but it is unknown to what extent heterogeneous damage distribution is caused by variations in formation or repair of DNA damage. Here, we used a restriction enzyme/qPCR based method to analyze DNA damage in promoter and coding regions of four nuclear genes: the two house-keeping genes Gadph and Tbp, and the Ndufa9 and Ndufs2 genes encoding mitochondrial complex I subunits, as well as mt-Rnr1 encoded by mitochondrial DNA (mtDNA). The distribution of steady-state levels of damage varied in a site-specific manner. Oxidative stress induced damage in nDNA to a similar extent in promoter and coding regions, and more so in mtDNA. The subsequent removal of damage from nDNA was efficient and comparable with recovery times depending on the initial damage load, while repair of mtDNA was delayed with subsequently slower repair rate. The repair was furthermore found to be independent of transcription or the transcription-coupled repair factor CSB, but dependent on cellular ATP. Our results demonstrate that the capacity to repair DNA is sufficient to remove exogenously induced damage. Thus, we conclude that the heterogeneous steady-state level of DNA damage in promoters and coding regions is caused by site-specific DNA damage/modifications that take place under normal metabolism. Copyright © 2014 Elsevier B.V. All rights reserved.
Gene and genon concept: coding versus regulation
2007-01-01
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Tramontano, A; Macchiato, M F
1986-01-01
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
E.W. Fobes; R.W. Rowe
1968-01-01
A system for classifying wood-using industries and recording pertinent statistics for automatic data processing is described. Forms and coding instructions for recording data of primary processing plants are included.
Empirical Evaluation of Hunk Metrics as Bug Predictors
NASA Astrophysics Data System (ADS)
Ferzund, Javed; Ahsan, Syed Nadeem; Wotawa, Franz
Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers maintain the quality of software. In this paper we empirically evaluate the use of hunk metrics as predictor of bugs. We present a technique for bug prediction that works at smallest units of code change called hunks. We build bug prediction models using random forests, which is an efficient machine learning classifier. Hunk metrics are used to train the classifier and each hunk metric is evaluated for its bug prediction capabilities. Our classifier can classify individual hunks as buggy or bug-free with 86 % accuracy, 83 % buggy hunk precision and 77% buggy hunk recall. We find that history based and change level hunk metrics are better predictors of bugs than code level hunk metrics.
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.
Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins
2018-06-05
With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
Bisarro Dos Reis, Mariana; Barros-Filho, Mateus Camargo; Marchi, Fábio Albuquerque; Beltrami, Caroline Moraes; Kuasne, Hellen; Pinto, Clóvis Antônio Lopes; Ambatipudi, Srikant; Herceg, Zdenko; Kowalski, Luiz Paulo; Rogatto, Silvia Regina
2017-11-01
Even though the majority of well-differentiated thyroid carcinoma (WDTC) is indolent, a number of cases display an aggressive behavior. Cumulative evidence suggests that the deregulation of DNA methylation has the potential to point out molecular markers associated with worse prognosis. To identify a prognostic epigenetic signature in thyroid cancer. Genome-wide DNA methylation assays (450k platform, Illumina) were performed in a cohort of 50 nonneoplastic thyroid tissues (NTs), 17 benign thyroid lesions (BTLs), and 74 thyroid carcinomas (60 papillary, 8 follicular, 2 Hürthle cell, 1 poorly differentiated, and 3 anaplastic). A prognostic classifier for WDTC was developed via diagonal linear discriminant analysis. The results were compared with The Cancer Genome Atlas (TCGA) database. A specific epigenetic profile was detected according to each histological subtype. BTLs and follicular carcinomas showed a greater number of methylated CpG in comparison with NTs, whereas hypomethylation was predominant in papillary and undifferentiated carcinomas. A prognostic classifier based on 21 DNA methylation probes was able to predict poor outcome in patients with WDTC (sensitivity 63%, specificity 92% for internal data; sensitivity 64%, specificity 88% for TCGA data). High-risk score based on the classifier was considered an independent factor of poor outcome (Cox regression, P < 0.001). The methylation profile of thyroid lesions exhibited a specific signature according to the histological subtype. A meaningful algorithm composed of 21 probes was capable of predicting the recurrence in WDTC. Copyright © 2017 Endocrine Society
NASA Astrophysics Data System (ADS)
Jafari, Mehdi; Kasaei, Shohreh
2012-01-01
Automatic brain tissue segmentation is a crucial task in diagnosis and treatment of medical images. This paper presents a new algorithm to segment different brain tissues, such as white matter (WM), gray matter (GM), cerebral spinal fluid (CSF), background (BKG), and tumor tissues. The proposed technique uses the modified intraframe coding yielded from H.264/(AVC), for feature extraction. Extracted features are then imposed to an artificial back propagation neural network (BPN) classifier to assign each block to its appropriate class. Since the newest coding standard, H.264/AVC, has the highest compression ratio, it decreases the dimension of extracted features and thus yields to a more accurate classifier with low computational complexity. The performance of the BPN classifier is evaluated using the classification accuracy and computational complexity terms. The results show that the proposed technique is more robust and effective with low computational complexity compared to other recent works.
NASA Astrophysics Data System (ADS)
Jafari, Mehdi; Kasaei, Shohreh
2011-12-01
Automatic brain tissue segmentation is a crucial task in diagnosis and treatment of medical images. This paper presents a new algorithm to segment different brain tissues, such as white matter (WM), gray matter (GM), cerebral spinal fluid (CSF), background (BKG), and tumor tissues. The proposed technique uses the modified intraframe coding yielded from H.264/(AVC), for feature extraction. Extracted features are then imposed to an artificial back propagation neural network (BPN) classifier to assign each block to its appropriate class. Since the newest coding standard, H.264/AVC, has the highest compression ratio, it decreases the dimension of extracted features and thus yields to a more accurate classifier with low computational complexity. The performance of the BPN classifier is evaluated using the classification accuracy and computational complexity terms. The results show that the proposed technique is more robust and effective with low computational complexity compared to other recent works.
2017-01-01
Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Conclusions Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance. PMID:28298265
Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John
2017-03-15
Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance. ©Chris Gibbons, Suzanne Richards, Jose Maria Valderas, John Campbell. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.03.2017.
Exploring the read-write genome: mobile DNA and mammalian adaptation.
Shapiro, James A
2017-02-01
The read-write genome idea predicts that mobile DNA elements will act in evolution to generate adaptive changes in organismal DNA. This prediction was examined in the context of mammalian adaptations involving regulatory non-coding RNAs, viviparous reproduction, early embryonic and stem cell development, the nervous system, and innate immunity. The evidence shows that mobile elements have played specific and sometimes major roles in mammalian adaptive evolution by generating regulatory sites in the DNA and providing interaction motifs in non-coding RNA. Endogenous retroviruses and retrotransposons have been the predominant mobile elements in mammalian adaptive evolution, with the notable exception of bats, where DNA transposons are the major agents of RW genome inscriptions. A few examples of independent but convergent exaptation of mobile DNA elements for similar regulatory rewiring functions are noted.
Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
Schwalbe, E C; Hicks, D; Rafiee, G; Bashton, M; Gohlke, H; Enshaei, A; Potluri, S; Matthiesen, J; Mather, M; Taleongpong, P; Chaston, R; Silmon, A; Curtis, A; Lindsey, J C; Crosier, S; Smith, A J; Goschzik, T; Doz, F; Rutkowski, S; Lannering, B; Pietsch, T; Bailey, S; Williamson, D; Clifford, S C
2017-10-18
Rapid and reliable detection of disease-associated DNA methylation patterns has major potential to advance molecular diagnostics and underpin research investigations. We describe the development and validation of minimal methylation classifier (MIMIC), combining CpG signature design from genome-wide datasets, multiplex-PCR and detection by single-base extension and MALDI-TOF mass spectrometry, in a novel method to assess multi-locus DNA methylation profiles within routine clinically-applicable assays. We illustrate the application of MIMIC to successfully identify the methylation-dependent diagnostic molecular subgroups of medulloblastoma (the most common malignant childhood brain tumour), using scant/low-quality samples remaining from the most recently completed pan-European medulloblastoma clinical trial, refractory to analysis by conventional genome-wide DNA methylation analysis. Using this approach, we identify critical DNA methylation patterns from previously inaccessible cohorts, and reveal novel survival differences between the medulloblastoma disease subgroups with significant potential for clinical exploitation.
An artificial neural network system to identify alleles in reference electropherograms.
Taylor, Duncan; Harrison, Ash; Powers, David
2017-09-01
Electropherograms are produced in great numbers in forensic DNA laboratories as part of everyday criminal casework. Before the results of these electropherograms can be used they must be scrutinised by analysts to determine what the identified data tells them about the underlying DNA sequences and what is purely an artefact of the DNA profiling process. This process of interpreting the electropherograms can be time consuming and is prone to subjective differences between analysts. Recently it was demonstrated that artificial neural networks could be used to classify information within an electropherogram as allelic (i.e. representative of a DNA fragment present in the DNA extract) or as one of several different categories of artefactual fluorescence that arise as a result of generating an electropherogram. We extend that work here to demonstrate a series of algorithms and artificial neural networks that can be used to identify peaks on an electropherogram and classify them. We demonstrate the functioning of the system on several profiles and compare the results to a leading commercial DNA profile reading system. Copyright © 2017 Elsevier B.V. All rights reserved.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-02-20
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-01-01
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Structural features based genome-wide characterization and prediction of nucleosome organization
2012-01-01
Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207
USDA-ARS?s Scientific Manuscript database
Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...
Szabóová, Dana; Bielik, Peter; Poláková, Silvia; Šoltys, Katarína; Jatzová, Katarína; Szemes, Tomáš
2017-01-01
Abstract The yeast Saccharomyces are widely used to test ecological and evolutionary hypotheses. A large number of nuclear genomic DNA sequences are available, but mitochondrial genomic data are insufficient. We completed mitochondrial DNA (mtDNA) sequencing from Illumina MiSeq reads for all Saccharomyces species. All are circularly mapped molecules decreasing in size with phylogenetic distance from Saccharomyces cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing open reading frames or GC clusters. Their most profound feature is species-specific alteration in gene order. The genetic code slightly differs from well-established yeast mitochondrial code as GUG is used rarely as the translation start and CGA and CGC code for arginine. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that Saccharomyces cariocanus should be assigned as a separate species and Saccharomyces bayanus CBS 380T should not be considered as a distinct species due to mtDNA nearly identical to Saccharomyces uvarum mtDNA. Apparently, comparison of mtDNAs should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species. PMID:28992063
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.
Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas
2014-01-01
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
Caruccio, Nicholas
2011-01-01
DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity
NASA Astrophysics Data System (ADS)
Mukherjee, Shashi Bajaj; Sen, Pradip Kumar
2010-10-01
Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
Physics behind the mechanical nucleosome positioning code
NASA Astrophysics Data System (ADS)
Zuiddam, Martijn; Everaers, Ralf; Schiessel, Helmut
2017-11-01
The positions along DNA molecules of nucleosomes, the most abundant DNA-protein complexes in cells, are influenced by the sequence-dependent DNA mechanics and geometry. This leads to the "nucleosome positioning code", a preference of nucleosomes for certain sequence motives. Here we introduce a simplified model of the nucleosome where a coarse-grained DNA molecule is frozen into an idealized superhelical shape. We calculate the exact sequence preferences of our nucleosome model and find it to reproduce qualitatively all the main features known to influence nucleosome positions. Moreover, using well-controlled approximations to this model allows us to come to a detailed understanding of the physics behind the sequence preferences of nucleosomes.
Müller, J-M V; Wissemann, J; Meli, M L; Dasen, G; Lutz, H; Heinzerling, L; Feige, K
2011-11-01
Whole blood pharmacokinetics of intratumourally injected naked plasmid DNA coding for equine Interleukin 12 (IL-12) was assessed as a means of in vivo gene transfer in the treatment of melanoma in grey horses. The expression of induced interferon gamma (IFN-g) was evaluated in order to determine the pharmacodynamic properties of in vivo gene transduction. Seven grey horses bearing melanoma were injected intratumourally with 250 µg naked plasmid DNA coding for IL-12. Peripheral blood and biopsies from the injection site were taken at 13 time points until day 14 post injection (p.i.). Samples were analysed using quantitative real-time PCR. Plasmid DNA was quantified in blood samples and mRNA expression for IFN-g in tissue samples. Plasmid DNA showed fast elimination kinetics with more than 99 % of the plasmid disappearing within 36 hours. IFN-g expression increased quickly after IL-12 plasmid injection, but varied between individual horses. Intratumoural injection of plasmid DNA is a feasible method for inducing transgene expression in vivo. Biological activity of the transgene IL-12 was confirmed by measuring expression of IFN-g.
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
LaPolla, R J; Mayne, K M; Davidson, N
1984-01-01
A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870
Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin
2006-11-01
The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.
Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu
2006-06-01
VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation.
Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu
2006-01-01
VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation. PMID:16757746
Supervised DNA Barcodes species classification: analysis, comparisons and results
2014-01-01
Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333
Bertke, S J; Meyers, A R; Wurzelbacher, S J; Bell, J; Lampl, M L; Robins, D
2012-12-01
Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, identifying the cause of injuries and illnesses in large datasets such as workers' compensation systems often requires reading and coding the free form accident text narrative for potentially millions of records. To alleviate the need for manual coding, this paper describes and evaluates a computer auto-coding algorithm that demonstrated the ability to code millions of claims quickly and accurately by learning from a set of previously manually coded claims. The auto-coding program was able to code claims as a musculoskeletal disorders, STF or other with approximately 90% accuracy. The program developed and discussed in this paper provides an accurate and efficient method for identifying the causation of workers' compensation claims as a STF or MSD in a large database based on the unstructured text narrative and resulting injury diagnoses. The program coded thousands of claims in minutes. The method described in this paper can be used by researchers and practitioners to relieve the manual burden of reading and identifying the causation of claims as a STF or MSD. Furthermore, the method can be easily generalized to code/classify other unstructured text narratives. Published by Elsevier Ltd.
Du, Ping; Li, Hongxia; Cao, Wei
2009-07-15
A novel and sensitive sandwich electrochemical biosensor based on the amplification of magnetic microbeads and Au nanoparticles (NPs) modified with bio bar code and PbS nanoparticles was constructed in the present work. In this method, the magnetic microspheres were coated with 4 layers polyelectrolytes in order to increase carboxyl groups on the surface of the magnetic microbeads, which enhanced the amount of the capture DNA. The amino-functionalized capture DNA on the surface of magnetic microbeads hybridized with one end of target DNA, the other end of which was hybridized with signal DNA probe labelled with Au NPs on the terminus. The Au NPs were modified with bio bar code and the PbS NPs were used as a marker for identifying the target oligoncleotide. The modification of magnetic microbeads could immobilize more amino-group terminal capture DNA, and the bio bar code could increase the amount of Au NPs that combined with the target DNA. The detection of lead ions performed by anodic stripping voltammetry (ASV) technology further improved the sensitivity of the biosensor. As a result, the present DNA biosensor showed good selectivity and sensitivity by the combined amplification. Under the optimum conditions, the linear relationship with the concentration of the target DNA was ranging from 2.0 x 10(-14) M to 1.0 x 10(-12)M and a detection limit as low as 5.0 x 10(-15)M was obtained.
Cloning and expression of a cDNA coding for catalase from zebrafish (Danio rerio).
Ken, C F; Lin, C T; Wu, J L; Shaw, J F
2000-06-01
A full-length complementary DNA (cDNA) clone encoding a catalase was amplified by the rapid amplication of cDNA ends-polymerase chain reaction (RACE-PCR) technique from zebrafish (Danio rerio) mRNA. Nucleotide sequence analysis of this cDNA clone revealed that it comprised a complete open reading frame coding for 526 amino acid residues and that it had a molecular mass of 59 654 Da. The deduced amino acid sequence showed high similarity with the sequences of catalase from swine (86.9%), mouse (85.8%), rat (85%), human (83.7%), fruit fly (75.6%), nematode (71.1%), and yeast (58.6%). The amino acid residues for secondary structures are apparently conserved as they are present in other mammal species. Furthermore, the coding region of zebrafish catalase was introduced into an expression vector, pET-20b(+), and transformed into Escherichia coli expression host BL21(DE3)pLysS. A 60-kDa active catalase protein was expressed and detected by Coomassie blue staining as well as activity staining on polyacrylamide gel followed electrophoresis.
Noncoding RNAs in DNA Repair and Genome Integrity
Wan, Guohui; Liu, Yunhua; Han, Cecil; Zhang, Xinna
2014-01-01
Abstract Significance: The well-studied sequences in the human genome are those of protein-coding genes, which account for only 1%–2% of the total genome. However, with the advent of high-throughput transcriptome sequencing technology, we now know that about 90% of our genome is extensively transcribed and that the vast majority of them are transcribed into noncoding RNAs (ncRNAs). It is of great interest and importance to decipher the functions of these ncRNAs in humans. Recent Advances: In the last decade, it has become apparent that ncRNAs play a crucial role in regulating gene expression in normal development, in stress responses to internal and environmental stimuli, and in human diseases. Critical Issues: In addition to those constitutively expressed structural RNA, such as ribosomal and transfer RNAs, regulatory ncRNAs can be classified as microRNAs (miRNAs), Piwi-interacting RNAs (piRNAs), small interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs), and long noncoding RNAs (lncRNAs). However, little is known about the biological features and functional roles of these ncRNAs in DNA repair and genome instability, although a number of miRNAs and lncRNAs are regulated in the DNA damage response. Future Directions: A major goal of modern biology is to identify and characterize the full profile of ncRNAs with regard to normal physiological functions and roles in human disorders. Clinically relevant ncRNAs will also be evaluated and targeted in therapeutic applications. Antioxid. Redox Signal. 20, 655–677. PMID:23879367
DNA methylation of miRNA coding sequences putatively associated with childhood obesity.
Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A
2017-02-01
Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.
Ultrasonic Inspection and Fatigue Evaluation of Critical Pore Size in Welds.
1981-09-01
Boiler and Pressure Vessel Code ) 20...Five porosity levels were produced that parallelled ASME boiler and pressure vessel code specification (Section VIII). Appendix IV of the pressure...Figure 2 shows porosity charts (ASME Boiler and Pressure Vessel Code ) which classify and designate the number and size of pores in any six inch length
Bhute, Vijesh J.; Palecek, Sean P.
2015-01-01
Genomic instability is one of the hallmarks of cancer. Several chemotherapeutic drugs and radiotherapy induce DNA damage to prevent cancer cell replication. Cells in turn activate different DNA damage response (DDR) pathways to either repair the damage or induce cell death. These DDR pathways also elicit metabolic alterations which can play a significant role in the proper functioning of the cells. The understanding of these metabolic effects resulting from different types of DNA damage and repair mechanisms is currently lacking. In this study, we used NMR metabolomics to identify metabolic pathways which are altered in response to different DNA damaging agents. By comparing the metabolic responses in MCF-7 cells, we identified the activation of poly (ADP-ribose) polymerase (PARP) in methyl methanesulfonate (MMS)-induced DNA damage. PARP activation led to a significant depletion of NAD+. PARP inhibition using veliparib (ABT-888) was able to successfully restore the NAD+ levels in MMS-treated cells. In addition, double strand break induction by MMS and veliparib exhibited similar metabolic responses as zeocin, suggesting an application of metabolomics to classify the types of DNA damage responses. This prediction was validated by studying the metabolic responses elicited by radiation. Our findings indicate that cancer cell metabolic responses depend on the type of DNA damage responses and can also be used to classify the type of DNA damage. PMID:26478723
Quantitative profiling of CpG island methylation in human stool for colorectal cancer detection.
Elliott, Giles O; Johnson, Ian T; Scarll, Jane; Dainty, Jack; Williams, Elizabeth A; Garg, D; Coupe, Amanda; Bradburn, David M; Mathers, John C; Belshaw, Nigel J
2013-01-01
The aims of this study were to investigate the use of quantitative CGI methylation data from stool DNA to classify colon cancer patients and to relate stool CGI methylation levels to those found in corresponding tissue samples. We applied a quantitative methylation-specific PCR assay to determine CGI methylation levels of six genes, previously shown to be aberrantly methylated during colorectal carcinogenesis. Assays were performed on DNA from biopsies of "normal" mucosa and stool samples from 57 patients classified as disease-free, adenoma, or cancer by endoscopy, and in tumour tissue from cancer patients. Additionally, CGI methylation was analysed in stool DNA from an asymptomatic population of individuals covering a broad age range (mean = 47 ± 24 years) CGI methylation levels in stool DNA were significantly higher than in DNA from macroscopically normal mucosa, and a significant correlation between stool and mucosa was observed for ESR1 only. Multivariate statistical analyses using the methylation levels of each CGI in stool DNA as a continuous variable revealed a highly significant (p = 0.003) classification of cancer vs. non-cancer (adenoma + disease-free) patients (sensitivity = 65 %, specificity = 81 %). CGI methylation profiling of stool DNA successfully identified patients with cancer despite the methylation status of CGIs in stool DNA not generally reflecting those in DNA from the colonic mucosa.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Matsui, H; Nakamura, G; Ishiga, Y; Toshima, H; Inagaki, Y; Toyoda, K; Shiraishi, T; Ichinose, Y
2004-02-01
Recently, we observed that expression of a pea gene (S64) encoding an oxophytodienoic acid reductase (OPR) was induced by a suppressor of pea defense responses, secreted by the pea pathogen Mycosphaerella pinodes. Because it is known that OPRs are usually encoded by families of homologous genes, we screened for genomic and cDNA clones encoding members of this putative OPR family in pea. We isolated five members of the OPR gene family from a pea genomic DNA library, and amplified six cDNA clones, including S64, by RT-PCR (reverse transcriptase-PCR). Sequencing analysis revealed that S64 corresponds to PsOPR2, and the amino acid sequences of the predicted products of the six OPR-like genes shared more than 80% identity with each other. Based on their sequence similarity, all these OPR-like genes code for OPRs of subgroup I, i.e., enzymes which are not required for jasmonic acid biosynthesis. However, the genes varied in their exon/intron organization and in their promoter sequences. To investigate the expression of each individual OPR-like gene, RT-PCR was performed using gene-specific primers. The results indicated that the OPR-like gene most strongly induced by the inoculation of pea plants with a compatible pathogen and by treatment with the suppressor from M. pinodes was PsOPR2. Furthermore, the ability of the six recombinant OPR-like proteins to reduce a model substrate, 2-cyclohexen-1-one (2-CyHE), was investigated. The results indicated that PsOPR1, 4 and 6 display robust activity, and PsOPR2 has a most remarkable ability to reduce 2-CyHE, whereas PsOPR3 has little and PsOPR5 does not reduce this compound. Thus, the six OPR-like proteins can be classified into four types. Interestingly, the gene structures, expression profiles, and enzymatic activities used to classify each member of the pea OPR-like gene family are clearly correlated, indicating that each member of this OPR-like family has a distinct function.
Analysis of the mitochondrial genome of cheetahs (Acinonyx jubatus) with neurodegenerative disease.
Burger, Pamela A; Steinborn, Ralf; Walzer, Christian; Petit, Thierry; Mueller, Mathias; Schwarzenberger, Franz
2004-08-18
The complete mitochondrial genome of Acinonyx jubatus was sequenced and mitochondrial DNA (mtDNA) regions were screened for polymorphisms as candidates for the cause of a neurodegenerative demyelinating disease affecting captive cheetahs. The mtDNA reference sequences were established on the basis of the complete sequences of two diseased and two nondiseased animals as well as partial sequences of 26 further individuals. The A. jubatus mitochondrial genome is 17,047-bp long and shows a high sequence similarity (91%) to the domestic cat. Based on single nucleotide polymorphisms (SNPs) in the control region (CR) and pedigree information, the 18 myelopathic and 12 non-myelopathic cheetahs included in this study were classified into haplotypes I, II and III. In view of the phenotypic comparability of the neurodegenerative disease observed in cheetahs and human mtDNA-associated diseases, specific coding regions including the tRNAs leucine UUR, lysine, serine UCN, and partial complex I and V sequences were screened. We identified a heteroplasmic and a homoplasmic SNP at codon 507 in the subunit 5 (MTND5) of complex I. The heteroplasmic haplotype I-specific valine to methionine substitution represents a nonconservative amino acid change and was found in 11 myelopathic and eight non-myelopathic cheetahs with levels ranging from 29% to 79%. The homoplasmic conservative amino acid substitution valine to alanine was identified in two myelopathic animals of haplotype II. In addition, a synonymous SNP in the codon 76 of the MTND4L gene was found in the single haplotype III animal. The amino acid exchanges in the MTND5 gene were not associated with the occurrence of neurodegenerative disease in captive cheetahs.
Martín-Navarro, Antonio; Gaudioso-Simón, Andrés; Álvarez-Jarreta, Jorge; Montoya, Julio; Mayordomo, Elvira; Ruiz-Pesini, Eduardo
2017-03-07
Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved. We have developed a SVM classifier (Mitoclass.1) specific for mtDNA missense variants. Training and validation of the model was executed with 2,835 mtDNA damaging and neutral amino acid substitutions, previously curated by a set of rigorous pathogenicity criteria with high specificity. Each instance is described by a set of three attributes based on evolutionary conservation in Eukaryota of wildtype and mutant amino acids as well as coevolution and a novel evolutionary analysis of specific substitutions belonging to the same domain of mitochondrial polypeptides. Our classifier has performed better than other web-available tested predictors. We checked performance of three broadly used predictors with the total mutations of our curated dataset. PolyPhen-2 showed the best results for a screening proposal with a good sensitivity. Nevertheless, the number of false positive predictions was too high. Our method has an improved sensitivity and better specificity in relation to PolyPhen-2. We also publish predictions for the complete set of 24,201 possible missense variants in the 13 human mtDNA-encoded polypeptides. Mitoclass.1 allows a better selection of candidate damaging missense variants from mtDNA. A careful search of discriminatory attributes and a training step based on a curated dataset of amino acid substitutions belonging exclusively to human mtDNA genes allows an improved performance. Mitoclass.1 accuracy could be improved in the future when more mtDNA missense substitutions will be available for updating the attributes and retraining the model.
Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin
2011-03-29
Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.
Novel mutations in the CHST6 gene associated with macular corneal dystrophy in southern India.
Warren, John F; Aldave, Anthony J; Srinivasan, M; Thonar, Eugene J; Kumar, Abha B; Cevallos, Vicky; Whitcher, John P; Margolis, Todd P
2003-11-01
To further characterize the role of the carbohydrate sulfotransferase (CHST6) gene in macular corneal dystrophy (MCD) through identification of causative mutations in a cohort of affected patients from southern India. Genomic DNA was extracted from buccal epithelium of 75 patients (51 families) with MCD, 33 unaffected relatives, and 48 healthy volunteers. The coding region of the CHST6 gene was evaluated by means of polymerase chain reaction amplification and direct sequencing. Subtyping of MCD into types I and II was performed by measuring serum levels of antigenic keratan sulfate. Seventy patients were classified as having type I MCD, and 5 patients as having type II MCD. Analysis of the CHST6 coding region in patients with type I MCD identified 11 homozygous missense mutations (Leu22Arg, His42Tyr, Arg50Cys, Arg50Leu, Ser53Leu, Arg97Pro, Cys102Tyr, Arg127Cys, Arg205Gln, His249Pro, and Glu274Lys), 2 compound heterozygous missense mutations (Arg93His and Ala206Thr), 5 homozygous deletion mutations (delCG707-708, delC890, delA1237, del1748-1770, and delORF), and 2 homozygous replacement mutations (ACCTAC 1273 GGT, and GCG 1304 AT). One patient with type II MCD was heterozygous for the C890 deletion mutation, whereas 4 possessed no CHST6 coding region mutations. A variety of previously unreported mutations in the coding region of the CHST6 gene are associated with type I MCD in a cohort of patients in southern India. An improved understanding of the genetic basis of MCD allows for earlier, more accurate diagnosis of affected individuals, and may provide the foundation for the development of novel disease treatments.
Samorì, Bruno; Zuccheri, Giampaolo
2005-02-11
The nanometer scale is a special place where all sciences meet and develop a particularly strong interdisciplinarity. While biology is a source of inspiration for nanoscientists, chemistry has a central role in turning inspirations and methods from biological systems to nanotechnological use. DNA is the biological molecule by which nanoscience and nanotechnology is mostly fascinated. Nature uses DNA not only as a repository of the genetic information, but also as a controller of the expression of the genes it contains. Thus, there are codes embedded in the DNA sequence that serve to control recognition processes on the atomic scale, such as the base pairing, and others that control processes taking place on the nanoscale. From the chemical point of view, DNA is the supramolecular building block with the highest informational content. Nanoscience has therefore the opportunity of using DNA molecules to increase the level of complexity and efficiency in self-assembling and self-directing processes.
nRC: non-coding RNA Classifier based on structural features.
Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso
2017-01-01
Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.
Cioffi, Anna Valentina; Ferrara, Diana; Cubellis, Maria Vittoria; Aniello, Francesco; Corrado, Marcella; Liguori, Francesca; Amoroso, Alessandro; Fucci, Laura; Branno, Margherita
2002-08-01
Analysis of the genome structure of the Paracentrotus lividus (sea urchin) DNA methyltransferase (DNA MTase) gene showed the presence of an open reading frame, named METEX, in intron 7 of the gene. METEX expression is developmentally regulated, showing no correlation with DNA MTase expression. In fact, DNA MTase transcripts are present at high concentrations in the early developmental stages, while METEX is expressed at late stages of development. Two METEX cDNA clones (Met1 and Met2) that are different in the 3' end have been isolated in a cDNA library screening. The putative translated protein from Met2 cDNA clone showed similarity with Escherichia coli endonuclease III on the basis of sequence and predictive three-dimensional structure. The protein, overexpressed in E. coli and purified, had functional properties similar to the endonuclease specific for apurinic/apyrimidinic (AP) sites on the basis of the lyase activity. Therefore the open reading frame, present in intron 7 of the P. lividus DNA MTase gene, codes for a functional AP endonuclease designated SuAP1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.
1987-06-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. Inmore » RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.« less
Zhao, A; Guo, A; Liu, Z; Pape, L
1997-01-01
The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645
Chromosome specific repetitive DNA sequences
Moyzis, Robert K.; Meyne, Julianne
1991-01-01
A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).
Komaki, Hisayuki; Sakurai, Kenta; Hosoyama, Akira; Kimura, Akane; Igarashi, Yasuhiro; Tamura, Tomohiko
2018-05-02
To identify the species of butyrolactol-producing Streptomyces strain TP-A0882, whole genome-sequencing of three type strains in a close taxonomic relationship was performed. In silico DNA-DNA hybridization using the genome sequences suggested that Streptomyces sp. TP-A0882 is classified as Streptomyces diastaticus subsp. ardesiacus. Strain TP-A0882, S. diastaticus subsp. ardesiacus NBRC 15402 T , Streptomyces coelicoflavus NBRC 15399 T , and Streptomyces rubrogriseus NBRC 15455 T harbor at least 14, 14, 10, and 12 biosynthetic gene clusters (BGCs), respectively, coding for nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs). All 14 gene clusters were shared by S. diastaticus subsp. ardesiacus strains TP-A0882 and NBRC 15402 T , while only four gene clusters were shared by the three distinct species. Although BGCs for bacteriocin, ectoine, indole, melanine, siderophores such as deferrioxamine, terpenes such as albaflavenone, hopene, carotenoid and geosmin are shared by the three species, many BGCs for secondary metabolites such as butyrolactone, lantipeptides, oligosaccharide, some terpenes are species-specific. These results indicate the possibility that strains belonging to the same species possess the same set of secondary metabolite-biosynthetic pathways, whereas strains belonging to distinct species have species-specific pathways, in addition to some common pathways, even if the strains are taxonomically close.
Resurrection of DNA Function In Vivo from an Extinct Genome
Pask, Andrew J.; Behringer, Richard R.; Renfree, Marilyn B.
2008-01-01
There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine), obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity. PMID:18493600
[DNA prints instead of plantar prints in neonatal identification].
Rodríguez-Alarcón Gómez, J; Martińez de Pancorbo Gómez, M; Santillana Ferrer, L; Castro Espido, A; Melchor Maros, J C; Linares Uribe, M A; Fernández-Llebrez del Rey, L; Aranguren Dúo, G
1996-06-22
To check the possible usefulness in studying DNA in dried blood spots taken on filter paper blotters for newborn identification. It set out to establish: 1. The validity of the method for analysis; 2. The validity of all stored samples (such as those kept in clinical records); 3. Guarantee of non-intrusion in the genetic code; 4. Acceptable price and execution time. Forty (40) anonymous 13-year-old samples of 20 subjects (2 per subject) were studied. DNA was extracted using Chelex resin and the STR ("small tandem repeat") of microsatellite DNA was studies using the "polimerase chain reaction method" (PCR). Three non coding DNA loci (CSF1PO, TPOX and THO1) were analyzed by Multiplex amplification. It was possible to type 39 samples, making it possible to match the 20 cases (one by exclusion). The complete procedure yielded the results within 24 hours in all cases. The estimated final cost was found to be a fifth of that conventional maternity/paternity tests. The study carried out made matching possible in all 20 cases (directly in 19 cases). It was not necessary to study DNA coding areas. The validity of the method for analyzing samples stored for 13 years without any special care was also demonstrated. The technic was fast, producing the results within 24 hours, and at reasonable cost.
Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki
2014-01-01
Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco
2016-03-01
Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
40 CFR 372.23 - SIC and NAICS codes to which this Part applies.
Code of Federal Regulations, 2014 CFR
2014-07-01
... engaged in manufacturing orthopedic devices to prescription in a retail environment (previously classified... for facilities primarily engaged in web search portals; 541712—Research and Development in the... engaged in Guided missile and space vehicle engine research and development (previously classified under...
Avatar DNA Nanohybrid System in Chip-on-a-Phone
NASA Astrophysics Data System (ADS)
Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho
2014-05-01
Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording.
Avatar DNA Nanohybrid System in Chip-on-a-Phone
Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho
2014-01-01
Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording. PMID:24824876
Lectin cDNA and transgenic plants derived therefrom
Raikhel, Natasha V.
2000-10-03
Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties.
Bertke, S. J.; Meyers, A. R.; Wurzelbacher, S. J.; Bell, J.; Lampl, M. L.; Robins, D.
2015-01-01
Introduction Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, identifying the cause of injuries and illnesses in large datasets such as workers’ compensation systems often requires reading and coding the free form accident text narrative for potentially millions of records. Method To alleviate the need for manual coding, this paper describes and evaluates a computer auto-coding algorithm that demonstrated the ability to code millions of claims quickly and accurately by learning from a set of previously manually coded claims. Conclusions The auto-coding program was able to code claims as a musculoskeletal disorders, STF or other with approximately 90% accuracy. Impact on industry The program developed and discussed in this paper provides an accurate and efficient method for identifying the causation of workers’ compensation claims as a STF or MSD in a large database based on the unstructured text narrative and resulting injury diagnoses. The program coded thousands of claims in minutes. The method described in this paper can be used by researchers and practitioners to relieve the manual burden of reading and identifying the causation of claims as a STF or MSD. Furthermore, the method can be easily generalized to code/classify other unstructured text narratives. PMID:23206504
Sharma, Mukul; Vedithi, Sundeep Chaitanya; Das, Madhusmita; Roy, Anindya; Ebenezer, Mannam
2017-01-01
Survival of Mycobacterium leprae, the causative bacteria for leprosy, in the human host is dependent to an extent on the ways in which its genome integrity is retained. DNA repair mechanisms protect bacterial DNA from damage induced by various stress factors. The current study is aimed at understanding the sequence and functional annotation of DNA repair genes in M. leprae. T he genome of M. leprae was annotated using sequence alignment tools to identify DNA repair genes that have homologs in Mycobacterium tuberculosis and Escherichia coli. A set of 96 genes known to be involved in DNA repair mechanisms in E. coli and Mycobacteriaceae were chosen as a reference. Among these, 61 were identified in M. leprae based on sequence similarity and domain architecture. The 61 were classified into 36 characterized gene products (59%), 11 hypothetical proteins (18%), and 14 pseudogenes (23%). All these genes have homologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli were present in M. leprae and in Mycobacteriaceae. These 61 genes were further investigated for their expression profiles in the whole transcriptome microarray data of M. leprae which was obtained from the signal intensities of 60bp probes, tiling the entire genome with 10bp overlaps. It was noted that transcripts corresponding to all the 61 genes were identified in the transcriptome data with varying expression levels ranging from 0.18 to 2.47 fold (normalized with 16SrRNA). The mRNA expression levels of a representative set of seven genes ( four annotated and three hypothetical protein coding genes) were analyzed using quantitative Polymerase Chain Reaction (qPCR) assays with RNA extracted from skin biopsies of 10 newly diagnosed, untreated leprosy cases. It was noted that RNA expression levels were higher for genes involved in homologous recombination whereas the genes with a low level of expression are involved in the direct repair pathway. This study provided preliminary information on the potential DNA repair pathways that are extant in M. leprae and the associated genes.
The PARTRAC code: Status and recent developments
NASA Astrophysics Data System (ADS)
Friedland, Werner; Kundrat, Pavel
Biophysical modeling is of particular value for predictions of radiation effects due to manned space missions. PARTRAC is an established tool for Monte Carlo-based simulations of radiation track structures, damage induction in cellular DNA and its repair [1]. Dedicated modules describe interactions of ionizing particles with the traversed medium, the production and reactions of reactive species, and score DNA damage determined by overlapping track structures with multi-scale chromatin models. The DNA repair module describes the repair of DNA double-strand breaks (DSB) via the non-homologous end-joining pathway; the code explicitly simulates the spatial mobility of individual DNA ends in parallel with their processing by major repair enzymes [2]. To simulate the yields and kinetics of radiation-induced chromosome aberrations, the repair module has been extended by tracking the information on the chromosome origin of ligated fragments as well as the presence of centromeres [3]. PARTRAC calculations have been benchmarked against experimental data on various biological endpoints induced by photon and ion irradiation. The calculated DNA fragment distributions after photon and ion irradiation reproduce corresponding experimental data and their dose- and LET-dependence. However, in particular for high-LET radiation many short DNA fragments are predicted below the detection limits of the measurements, so that the experiments significantly underestimate DSB yields by high-LET radiation [4]. The DNA repair module correctly describes the LET-dependent repair kinetics after (60) Co gamma-rays and different N-ion radiation qualities [2]. First calculations on the induction of chromosome aberrations have overestimated the absolute yields of dicentrics, but correctly reproduced their relative dose-dependence and the difference between gamma- and alpha particle irradiation [3]. Recent developments of the PARTRAC code include a model of hetero- vs euchromatin structures to enable accounting for variations in DNA damage yields, complexity and repair between these regions. Second, the applicability of the code to low-energy ions has been extended to full stopping by using a modified Barkas scaling of proton cross sections for ions heavier than helium. Third, ongoing studies aim at hitherto unprecedented benchmarking of the code against experiments with sub-muµm focused bunches of low-LET ions mimicking single high-LET ion tracks [5] which separate effects of damage clustering on a sub-mum scale from DNA damage complexity on a nanometer scale. Fourth, motivated by implications for the involvement of mitochondria in intercellular signaling and radiation-induced bystander effects, ongoing work extends the range of PARTRAC DNA models to radiation effects on mitochondrial DNA. The contribution will discuss the PARTRAC modules, benchmarks to experimental data, recent and ongoing developments of the code, with special attention to its implications and potential applications in radiation protection and space research. Acknowledgement. This work was partially funded by the EU (Contract FP7-249689 ‘DoReMi’). References 1. Friedland et al., Mutat. Res. 711, 28 (2011) 2. Friedland et al., Int. J. Radiat. Biol. 88, 129 (2012) 3. Friedland et al., Mutat. Res. 756, 213 (2013) 4. Alloni et al., Radiat. Res. 179, 690 (2013) 5. Schmid et al., Phys. Med. Biol. 57, 5889 (2012)
NASA Technical Reports Server (NTRS)
Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard
2003-01-01
All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.
Delimitation of essential genes of cassava latent virus DNA 2.
Etessami, P; Callis, R; Ellwood, S; Stanley, J
1988-01-01
Insertion and deletion mutagenesis of both extended open reading frames (ORFs) of cassava latent virus DNA 2 destroys infectivity. Infectivity is restored by coinoculating constructs that contain single mutations within different ORFs. Although frequent intermolecular recombination produces dominant parental-type virus, mutants can be retained within the virus population indicating that they are competent for replication and suggesting that rescue can occur by complementation of trans acting gene products. By cloning specific fragments into DNA 1 coat protein deletion vectors we have delimited the DNA 2 coding regions and provide substantive evidence that both are essential for virus infection. Although a DNA 2 component is unique to whitefly-transmitted geminiviruses, the results demonstrate that neither coding region is involved solely in insect transmission. The requirement for a bipartite genome for whitefly-transmitted geminiviruses is discussed. Images PMID:3387209
22 CFR 125.3 - Exports of classified technical data and classified defense articles.
Code of Federal Regulations, 2010 CFR
2010-04-01
... in the Department of Defense National Industrial Security Program Operating Manual (unless such.... It should also list the facility security clearance code of all U.S. parties on the license and include the Defense Security Service cognizant security office of the party responsible for packaging the...
40 CFR 372.23 - SIC and NAICS codes to which this Part applies.
Code of Federal Regulations, 2013 CFR
2013-07-01
...); 211112Natural Gas Liquid Extraction Limited to facilities that recover sulfur from natural gas (previously classified under SIC 2819, Industrial Inorganic chemicals, NEC (recovering sulfur from natural gas... engaged in providing combinations of electric, gas, and other services, not elsewhere classified (N.E.C...
40 CFR 372.23 - SIC and NAICS codes to which this Part applies.
Code of Federal Regulations, 2011 CFR
2011-07-01
...); 211112Natural Gas Liquid Extraction Limited to facilities that recover sulfur from natural gas (previously classified under SIC 2819, Industrial Inorganic chemicals, NEC (recovering sulfur from natural gas... engaged in providing combinations of electric, gas, and other services, not elsewhere classified (N.E.C...
40 CFR 372.23 - SIC and NAICS codes to which this Part applies.
Code of Federal Regulations, 2012 CFR
2012-07-01
...); 211112Natural Gas Liquid Extraction Limited to facilities that recover sulfur from natural gas (previously classified under SIC 2819, Industrial Inorganic chemicals, NEC (recovering sulfur from natural gas... engaged in providing combinations of electric, gas, and other services, not elsewhere classified (N.E.C...
Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo
2014-01-01
Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.
Lichenase and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2000-08-15
The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.
The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.
Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L
2003-11-01
Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Multiple pathogen biomarker detection using an encoded bead array in droplet PCR.
Periyannan Rajeswari, Prem Kumar; Soderberg, Lovisa M; Yacoub, Alia; Leijon, Mikael; Andersson Svahn, Helene; Joensson, Haakan N
2017-08-01
We present a droplet PCR workflow for detection of multiple pathogen DNA biomarkers using fluorescent color-coded Luminex® beads. This strategy enables encoding of multiple singleplex droplet PCRs using a commercially available bead set of several hundred distinguishable fluorescence codes. This workflow provides scalability beyond the limited number offered by fluorescent detection probes such as TaqMan probes, commonly used in current multiplex droplet PCRs. The workflow was validated for three different Luminex bead sets coupled to target specific capture oligos to detect hybridization of three microorganisms infecting poultry: avian influenza, infectious laryngotracheitis virus and Campylobacter jejuni. In this assay, the target DNA was amplified with fluorescently labeled primers by PCR in parallel in monodisperse picoliter droplets, to avoid amplification bias. The color codes of the Luminex detection beads allowed concurrent and accurate classification of the different bead sets used in this assay. The hybridization assay detected target DNA of all three microorganisms with high specificity, from samples with average target concentration of a single DNA template molecule per droplet. This workflow demonstrates the possibility of increasing the droplet PCR assay detection panel to detect large numbers of targets in parallel, utilizing the scalability offered by the color-coded Luminex detection beads. Copyright © 2017. Published by Elsevier B.V.
The histone codes for meiosis.
Wang, Lina; Xu, Zhiliang; Khawar, Muhammad Babar; Liu, Chao; Li, Wei
2017-09-01
Meiosis is a specialized process that produces haploid gametes from diploid cells by a single round of DNA replication followed by two successive cell divisions. It contains many special events, such as programmed DNA double-strand break (DSB) formation, homologous recombination, crossover formation and resolution. These events are associated with dynamically regulated chromosomal structures, the dynamic transcriptional regulation and chromatin remodeling are mainly modulated by histone modifications, termed 'histone codes'. The purpose of this review is to summarize the histone codes that are required for meiosis during spermatogenesis and oogenesis, involving meiosis resumption, meiotic asymmetric division and other cellular processes. We not only systematically review the functional roles of histone codes in meiosis but also discuss future trends and perspectives in this field. © 2017 Society for Reproduction and Fertility.
Permutation coding technique for image recognition systems.
Kussul, Ernst M; Baidyk, Tatiana N; Wunsch, Donald C; Makeyev, Oleksandr; Martín, Anabel
2006-11-01
A feature extractor and neural classifier for image recognition systems are proposed. The proposed feature extractor is based on the concept of random local descriptors (RLDs). It is followed by the encoder that is based on the permutation coding technique that allows to take into account not only detected features but also the position of each feature on the image and to make the recognition process invariant to small displacements. The combination of RLDs and permutation coding permits us to obtain a sufficiently general description of the image to be recognized. The code generated by the encoder is used as an input data for the neural classifier. Different types of images were used to test the proposed image recognition system. It was tested in the handwritten digit recognition problem, the face recognition problem, and the microobject shape recognition problem. The results of testing are very promising. The error rate for the Modified National Institute of Standards and Technology (MNIST) database is 0.44% and for the Olivetti Research Laboratory (ORL) database it is 0.1%.
Antimicrobial peptide evolution in the Asiatic honey bee Apis cerana.
Xu, Peng; Shi, Min; Chen, Xue-Xin
2009-01-01
The Asiatic honeybee, Apis cerana Fabricius, is an important honeybee species in Asian countries. It is still found in the wild, but is also one of the few bee species that can be domesticated. It has acquired some genetic advantages and significantly different biological characteristics compared with other Apis species. However, it has been less studied, and over the past two decades, has become a threatened species in China. We designed primers for the sequences of the four antimicrobial peptide cDNA gene families (abaecin, defensin, apidaecin, and hymenoptaecin) of the Western honeybee, Apis mellifera L. and identified all the antimicrobial peptide cDNA genes in the Asiatic honeybee for the first time. All the sequences were amplified by reverse transcriptase-polymerase chain reaction (RT-PCR). In all, 29 different defensin cDNA genes coding 7 different defensin peptides, 11 different abaecin cDNA genes coding 2 different abaecin peptides, 13 different apidaecin cDNA genes coding 4 apidaecin peptides and 34 different hymenoptaecin cDNA genes coding 13 different hymenoptaecin peptides were cloned and identified from the Asiatic honeybee adult workers. Detailed comparison of these four antimicrobial peptide gene families with those of the Western honeybee revealed that there are many similarities in the quantity and amino acid components of peptides in the abaecin, defensin and apidaecin families, while many more hymenoptaecin peptides are found in the Asiatic honeybee than those in the Western honeybee (13 versus 1). The results indicated that the Asiatic honeybee adult generated more variable antimicrobial peptides, especially hymenoptaecin peptides than the Western honeybee when stimulated by pathogens or injury. This suggests that, compared to the Western honeybee that has a longer history of domestication, selection on the Asiatic honeybee has favored the generation of more variable antimicrobial peptides as protection against pathogens.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-02-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-01-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578
A Binary-Encounter-Bethe Approach to Simulate DNA Damage by the Direct Effect
NASA Technical Reports Server (NTRS)
Plante, Ianik; Cucinotta, Francis A.
2013-01-01
The DNA damage is of crucial importance in the understanding of the effects of ionizing radiation. The main mechanisms of DNA damage are by the direct effect of radiation (e.g. direct ionization) and by indirect effect (e.g. damage by.OH radicals created by the radiolysis of water). Despite years of research in this area, many questions on the formation of DNA damage remains. To refine existing DNA damage models, an approach based on the Binary-Encounter-Bethe (BEB) model was developed[1]. This model calculates differential cross sections for ionization of the molecular orbitals of the DNA bases, sugars and phosphates using the electron binding energy, the mean kinetic energy and the occupancy number of the orbital. This cross section has an analytic form which is quite convenient to use and allows the sampling of the energy loss occurring during an ionization event. To simulate the radiation track structure, the code RITRACKS developed at the NASA Johnson Space Center is used[2]. This code calculates all the energy deposition events and the formation of the radiolytic species by the ion and the secondary electrons as well. We have also developed a technique to use the integrated BEB cross section for the bases, sugar and phosphates in the radiation transport code RITRACKS. These techniques should allow the simulation of DNA damage by ionizing radiation, and understanding of the formation of double-strand breaks caused by clustered damage in different conditions.
Prody, C A; Zevin-Sonkin, D; Gnatt, A; Goldberg, O; Soreq, H
1987-01-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase (BtChoEase; EC 3.1.1.8) and Torpedo electric organ "true" acetylcholinesterase (AcChoEase; EC 3.1.1.7). Using these probes, we isolated several cDNA clones from lambda gt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A)+ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These findings demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species. Images PMID:3035536
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao
2018-01-01
Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510
Identifying aggressive prostate cancer foci using a DNA methylation classifier.
Mundbjerg, Kamilla; Chopra, Sameer; Alemozaffar, Mehrdad; Duymich, Christopher; Lakshminarasimhan, Ranjani; Nichols, Peter W; Aron, Manju; Siegmund, Kimberly D; Ukimura, Osamu; Aron, Monish; Stern, Mariana; Gill, Parkash; Carpten, John D; Ørntoft, Torben F; Sørensen, Karina D; Weisenberger, Daniel J; Jones, Peter A; Duddalwar, Vinay; Gill, Inderbir; Liang, Gangning
2017-01-12
Slow-growing prostate cancer (PC) can be aggressive in a subset of cases. Therefore, prognostic tools to guide clinical decision-making and avoid overtreatment of indolent PC and undertreatment of aggressive disease are urgently needed. PC has a propensity to be multifocal with several different cancerous foci per gland. Here, we have taken advantage of the multifocal propensity of PC and categorized aggressiveness of individual PC foci based on DNA methylation patterns in primary PC foci and matched lymph node metastases. In a set of 14 patients, we demonstrate that over half of the cases have multiple epigenetically distinct subclones and determine the primary subclone from which the metastatic lesion(s) originated. Furthermore, we develop an aggressiveness classifier consisting of 25 DNA methylation probes to determine aggressive and non-aggressive subclones. Upon validation of the classifier in an independent cohort, the predicted aggressive tumors are significantly associated with the presence of lymph node metastases and invasive tumor stages. Overall, this study provides molecular-based support for determining PC aggressiveness with the potential to impact clinical decision-making, such as targeted biopsy approaches for early diagnosis and active surveillance, in addition to focal therapy.
Mas, Sergi; Crescenti, Anna; Gassó, Patricia; Vidal-Taboada, Jose M; Lafuente, Amalia
2007-08-01
As pharmacogenetic studies frequently require establishment of DNA banks containing large cohorts with multi-centric designs, inexpensive methods for collecting and storing high-quality DNA are needed. The aims of this study were two-fold: to compare the amount and quality of DNA obtained from two different DNA cards (IsoCode Cards or FTA Classic Cards, Whatman plc, Brentford, Middlesex, UK); and to evaluate the effects of time and storage temperature, as well as the influence of anticoagulant ethylenediaminetetraacetic acid on the DNA elution procedure. The samples were genotyped by several methods typically used in pharmacogenetic studies: multiplex PCR, PCR-restriction fragment length polymorphism, single nucleotide primer extension, and allelic discrimination assay. In addition, they were amplified by whole genome amplification to increase genomic DNA mass. Time, storage temperature and ethylenediaminetetraacetic acid had no significant effects on either DNA card. This study reveals the importance of drying blood spots prior to isolation to avoid haemoglobin interference. Moreover, our results demonstrate that re-isolation protocols could be applied to increase the amount of DNA recovered. The samples analysed were accurately genotyped with all the methods examined herein. In conclusion, our study shows that both DNA cards, IsoCode Cards and FTA Classic Cards, facilitate genetic and pharmacogenetic testing for routine clinical practice.
2004-09-01
to provide access to and protect are the NG Game Code, Employee Files, E -MAIL, Marketing Plans and Legacy Code. The NG Game Code is the MOST...major hit. E -MAIL is classified NON- SENSITIVE. The Marketing Plans are for the NG Game Code. They contain information concerning what the new...simulation games currently on the market except that, rather than allowing players to choose rides, refreshments and facilities, CyberCIEGE will
Forster, Alan J; Bernard, Burnand; Drösler, Saskia E; Gurevich, Yana; Harrison, James; Januel, Jean-Marie; Romano, Patrick S; Southern, Danielle A; Sundararajan, Vijaya; Quan, Hude; Vanderloo, Saskia E; Pincus, Harold A; Ghali, William A
2017-08-01
To assess the utility of the proposed World Health Organization (WHO)'s International Classification of Disease (ICD) framework for classifying patient safety events. Independent classification of 45 clinical vignettes using a web-based platform. The WHO's multi-disciplinary Quality and Safety Topic Advisory Group. The framework consists of three concepts: harm, cause and mode. We defined a concept as 'classifiable' if more than half of the raters could assign an ICD-11 code for the case. We evaluated reasons why cases were nonclassifiable using a qualitative approach. Harm was classifiable in 31 of 45 cases (69%). Of these, only 20 could be classified according to cause and mode. Classifiable cases were those in which a clear cause and effect relationship existed (e.g. medication administration error). Nonclassifiable cases were those without clear causal attribution (e.g. pressure ulcer). Of the 14 cases in which harm was not evident (31%), only 5 could be classified according to cause and mode and represented potential adverse events. Overall, nine cases (20%) were nonclassifiable using the three-part patient safety framework and contained significant ambiguity in the relationship between healthcare outcome and putative cause. The proposed framework enabled classification of the majority of patient safety events. Cases in which potentially harmful events did not cause harm were not classifiable; additional code categories within the ICD-11 are one proposal to address this concern. Cases with ambiguity in cause and effect relationship between healthcare processes and outcomes remain difficult to classify. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda
2013-01-01
While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002.
Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda
2013-01-01
While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002. PMID:23867905
The large-scale environment from cosmological simulations - I. The baryonic cosmic web
NASA Astrophysics Data System (ADS)
Cui, Weiguang; Knebe, Alexander; Yepes, Gustavo; Yang, Xiaohu; Borgani, Stefano; Kang, Xi; Power, Chris; Staveley-Smith, Lister
2018-01-01
Using a series of cosmological simulations that includes one dark-matter-only (DM-only) run, one gas cooling-star formation-supernova feedback (CSF) run and one that additionally includes feedback from active galactic nuclei (AGNs), we classify the large-scale structures with both a velocity-shear-tensor code (VWEB) and a tidal-tensor code (PWEB). We find that the baryonic processes have almost no impact on large-scale structures - at least not when classified using aforementioned techniques. More importantly, our results confirm that the gas component alone can be used to infer the filamentary structure of the universe practically un-biased, which could be applied to cosmology constraints. In addition, the gas filaments are classified with its velocity (VWEB) and density (PWEB) fields, which can theoretically connect to the radio observations, such as H I surveys. This will help us to bias-freely link the radio observations with dark matter distributions at large scale.
Ishiguro, Naotaka; Inoshima, Yasuo; Yanai, Tokuma; Sasaki, Motoki; Matsui, Akira; Kikuchi, Hiroki; Maruyama, Masashi; Hongo, Hitomi; Vostretsov, Yuri E; Gasilin, Viatcheslav; Kosintsev, Pavel A; Quanjia, Chen; Chunxue, Wang
2016-02-01
The mitochondrial DNA (mtDNA) control region (198- to 598-bp) of four ancient Canis specimens (two Canis mandibles, a cranium, and a first phalanx) was examined, and each specimen was genetically identified as Japanese wolf. Two unique nucleotide substitutions, the 78-C insertion and the 482-G deletion, both of which are specific for Japanese wolf, were observed in each sample. Based on the mtDNA sequences analyzed, these four specimens and 10 additional Japanese wolf samples could be classified into two groups- Group A (10 samples) and Group B (4 samples)-which contain or lack an 8-bp insertion/deletion (indel), respectively. Interestingly, three dogs (Akita-b, Kishu 25, and S-husky 102) that each contained Japanese wolf-specific features were also classified into Group A or B based on the 8-bp indel. To determine the origin or ancestor of the Japanese wolf, mtDNA control regions of ancient continental Canis specimens were examined; 84 specimens were from Russia, and 29 were from China. However, none of these 113 specimens contained Japanese wolf-specific sequences. Moreover, none of 426 Japanese modern hunting dogs examined contained these Japanese wolf-specific mtDNA sequences. The mtDNA control region sequences of Groups A and B appeared to be unique to grey wolf and dog populations.
Comparative genomics of 9 novel Paenibacillus larvae bacteriophages
Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.
2016-01-01
ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559
Paarup, Maiken; Friedrich, Michael W; Tindall, Brian J; Finster, Kai
2006-01-01
A psychrotolerant, obligate anaerobic, acetogenic bacterium designated strain SyrA5 was isolated from black anoxic sediment of a brackish fjord. Cells were Gram-positive, non-sporeforming rods. The isolate utilized H(2)/CO(2), CO, fructose, glucose, ethanol, ethylene glycol, glycerol, pyruvate, lactate, betaine and the methyl-groups of several methoxylated benzoic derivatives such as syringate, trimethoxybenzoate and vallinate. The optimum temperature for growth was 29 degrees C, whilst slow growth occurred at 2 degrees C. The strain grew optimally with NaCl concentrations below 2.7% (w/v), but growth occurred up to 4.3% (w/v) NaCl. Growth was observed in the range from pH 5.9 to 8.5, optimum at pH 8. The G+C content was 44.1 mol%. Based upon 16S rRNA gene sequence analysis and DNA-DNA reassociation studies, the organism was classified in the genus Acetobacterium. Strain SyrA5 shared a 16S rRNA sequence similarity with A. carbinolicum of 100%, a fthfs gene (which codes for the N5,N10 tetrahydrofolate synthetase) sequence identity of 98.5-98.7% (amino acid sequence similarities were 99.4-100%) and a RNA-DNA hybridization homology of 64-68%. Despite a number of phenotypic differences between strain SyrA5 and A. carbinolicum we propose including strain SyrA5 as a subspecies of A. carbinolicum for which we propose the name Acetobacterium carbinolicum subspecies kysingense. The type strain is SyrA5 (=DSM 16427(T), ATCC BAA-990).
A Radiation Chemistry Code Based on the Green's Function of the Diffusion Equation
NASA Technical Reports Server (NTRS)
Plante, Ianik; Wu, Honglu
2014-01-01
Stochastic radiation track structure codes are of great interest for space radiation studies and hadron therapy in medicine. These codes are used for a many purposes, notably for microdosimetry and DNA damage studies. In the last two decades, they were also used with the Independent Reaction Times (IRT) method in the simulation of chemical reactions, to calculate the yield of various radiolytic species produced during the radiolysis of water and in chemical dosimeters. Recently, we have developed a Green's function based code to simulate reversible chemical reactions with an intermediate state, which yielded results in excellent agreement with those obtained by using the IRT method. This code was also used to simulate and the interaction of particles with membrane receptors. We are in the process of including this program for use with the Monte-Carlo track structure code Relativistic Ion Tracks (RITRACKS). This recent addition should greatly expand the capabilities of RITRACKS, notably to simulate DNA damage by both the direct and indirect effect.
Embedded feature ranking for ensemble MLP classifiers.
Windeatt, Terry; Duangsoithong, Rakkrit; Smith, Raymond
2011-06-01
A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.
40 CFR 63.2435 - Am I subject to the requirements in this subpart?
Code of Federal Regulations, 2013 CFR
2013-07-01
... MCPU includes equipment necessary to operate a miscellaneous organic chemical manufacturing process, as...)(1)(i), (ii), (iii), (iv), or (v) of this section. (i) An organic chemical(s) classified using the...)(5) of this section. (ii) An organic chemical(s) classified using the 1997 version of NAICS code 325...
40 CFR 63.2435 - Am I subject to the requirements in this subpart?
Code of Federal Regulations, 2012 CFR
2012-07-01
... MCPU includes equipment necessary to operate a miscellaneous organic chemical manufacturing process, as...)(1)(i), (ii), (iii), (iv), or (v) of this section. (i) An organic chemical(s) classified using the...)(5) of this section. (ii) An organic chemical(s) classified using the 1997 version of NAICS code 325...
40 CFR 63.2435 - Am I subject to the requirements in this subpart?
Code of Federal Regulations, 2014 CFR
2014-07-01
... MCPU includes equipment necessary to operate a miscellaneous organic chemical manufacturing process, as...)(1)(i), (ii), (iii), (iv), or (v) of this section. (i) An organic chemical(s) classified using the...)(5) of this section. (ii) An organic chemical(s) classified using the 1997 version of NAICS code 325...
Comparisons of Young Children's Private Speech Profiles: Analogical Versus Nonanalogical Reasoners.
ERIC Educational Resources Information Center
Manning, Brenda H.; White, C. Stephen
The primary intention of this study was to compare private speech profiles of young children classified as analogical reasoners (AR) with young children classified as nonanalogical reasoners (NAR). The secondary purpose was to investigate Berk's (1986) research methodology and categorical scheme for the collection and coding of private speech…
Use of the Coding Causes of Death in HIV in the classification of deaths in Northeastern Brazil.
Alves, Diana Neves; Bresani-Salvi, Cristiane Campello; Batista, Joanna d'Arc Lyra; Ximenes, Ricardo Arraes de Alencar; Miranda-Filho, Demócrito de Barros; Melo, Heloísa Ramos Lacerda de; Albuquerque, Maria de Fátima Pessoa Militão de
2017-01-01
Describe the coding process of death causes for people living with HIV/AIDS, and classify deaths as related or unrelated to immunodeficiency by applying the Coding Causes of Death in HIV (CoDe) system. A cross-sectional study that codifies and classifies the causes of deaths occurring in a cohort of 2,372 people living with HIV/AIDS, monitored between 2007 and 2012, in two specialized HIV care services in Pernambuco. The causes of death already codified according to the International Classification of Diseases were recoded and classified as deaths related and unrelated to immunodeficiency by the CoDe system. We calculated the frequencies of the CoDe codes for the causes of death in each classification category. There were 315 (13%) deaths during the study period; 93 (30%) were caused by an AIDS-defining illness on the Centers for Disease Control and Prevention list. A total of 232 deaths (74%) were related to immunodeficiency after application of the CoDe. Infections were the most common cause, both related (76%) and unrelated (47%) to immunodeficiency, followed by malignancies (5%) in the first group and external causes (16%), malignancies (12 %) and cardiovascular diseases (11%) in the second group. Tuberculosis comprised 70% of the immunodeficiency-defining infections. Opportunistic infections and aging diseases were the most frequent causes of death, adding multiple disease burdens on health services. The CoDe system increases the probability of classifying deaths more accurately in people living with HIV/AIDS. Descrever o processo de codificação das causas de morte em pessoas vivendo com HIV/Aids, e classificar os óbitos como relacionados ou não relacionados à imunodeficiência aplicando o sistema Coding Causes of Death in HIV (CoDe). Estudo transversal, que codifica e classifica as causas dos óbitos ocorridos em uma coorte de 2.372 pessoas vivendo com HIV/Aids acompanhadas entre 2007 e 2012 em dois serviços de atendimento especializado em HIV em Pernambuco. As causas de óbito já codificadas a partir da Classificação Internacional de Doenças foram recodificadas e classificadas como óbitos relacionados e não relacionados à imunodeficiência pelo sistema CoDe. Foram calculadas as frequências dos códigos CoDe das causas do óbito em cada categoria de classificação. Ocorreram 315 (13%) óbitos no período do estudo; 93 (30%) tinham como causa uma doença definidora de Aids da lista do Centers for Disease Control and Prevention. No total 232 óbitos (74%) foram relacionados à imunodeficiência após aplicar o CoDe. As infecções foram as causas mais comuns, tanto nos óbitos relacionados (76%) como não relacionados (47%) à imunodeficiência, seguindo-se de malignidades (5%) no primeiro grupo e de causas externas (16%), malignidades (12%) e doenças cardiovasculares (11%) no segundo. A tuberculose compreendeu 70% das infecções definidoras de imunodeficiência. Infecções oportunistas e doenças do envelhecimento foram as causas mais frequentes de óbito, imprimindo carga múltipla de doenças aos serviços de saúde. O sistema CoDe aumenta a probabilidade de classificar os óbitos com maior precisão em pessoas vivendo com HIV/Aids.
Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E
2012-07-01
Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Urine cell-based DNA methylation classifier for monitoring bladder cancer.
van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio
2018-01-01
Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.
Nodeomics: Pathogen Detection in Vertebrate Lymph Nodes Using Meta-Transcriptomics
Wittekindt, Nicola E.; Padhi, Abinash; Schuster, Stephan C.; Qi, Ji; Zhao, Fangqing; Tomsho, Lynn P.; Kasson, Lindsay R.; Packard, Michael; Cross, Paul C.; Poss, Mary
2010-01-01
The ongoing emergence of human infections originating from wildlife highlights the need for better knowledge of the microbial community in wildlife species where traditional diagnostic approaches are limited. Here we evaluate the microbial biota in healthy mule deer (Odocoileus hemionus) by analyses of lymph node meta-transcriptomes. cDNA libraries from five individuals and two pools of samples were prepared from retropharyngeal lymph node RNA enriched for polyadenylated RNA and sequenced using Roche-454 Life Sciences technology. Protein-coding and 16S ribosomal RNA (rRNA) sequences were taxonomically profiled using protein and rRNA specific databases. Representatives of all bacterial phyla were detected in the seven libraries based on protein-coding transcripts indicating that viable microbiota were present in lymph nodes. Residents of skin and rumen, and those ubiquitous in mule deer habitat dominated classifiable bacterial species. Based on detection of both rRNA and protein-coding transcripts, we identified two new proteobacterial species; a Helicobacter closely related to Helicobacter cetorum in the Helicobacter pylori/Helicobacter acinonychis complex and an Acinetobacter related to Acinetobacter schindleri. Among viruses, a novel gamma retrovirus and other members of the Poxviridae and Retroviridae were identified. We additionally evaluated bacterial diversity by amplicon sequencing the hypervariable V6 region of 16S rRNA and demonstrate that overall taxonomic diversity is higher with the meta-transcriptomic approach. These data provide the most complete picture to date of the microbial diversity within a wildlife host. Our research advances the use of meta-transcriptomics to study microbiota in wildlife tissues, which will facilitate detection of novel organisms with pathogenic potential to human and animals.
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Noncoding sequence classification based on wavelet transform analysis: part I
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Analyzing Prosocial Content on T.V.
ERIC Educational Resources Information Center
Davidson, Emily S.; Neale, John M.
To enhance knowledge of television content, a prosocial code was developed by watching a large number of potentially prosocial television programs and making notes on all the positive acts. The behaviors were classified into a workable number of categories. The prosocial code is largely verbal and contains seven categories which fall into two…
Porschen, R; Remy, U; Bevers, G; Schauseil, S; Hengels, K J; Borchard, F
1993-06-15
The prognostic significance of tumor DNA ploidy in patients with cancer of the pancreas has not been defined because conflicting results have been reported. DNA content was measured in 56 ductal adenocarcinomas of the pancreas. DNA ploidy status was evaluated by flow cytometry in nuclei isolated from paraffin-embedded tumor tissues. An abnormal DNA stemline was observed in 27 (48%) patients. The percentage of aneuploid tumors was significantly increased in tumors classified as Stage III/IV (53%) compared with those classified as Stage I (22%). A borderline significant association existed between DNA ploidy and radicality of surgery (P = 0.08). The median survival of patients with diploid carcinomas was 6.9 months (standard error, +/- 0.9) in comparison to 4.5 +/- 1.2 months for patients with aneuploid tumors (P = 0.013 by generalized Wilcoxon test; P = 0.023 by generalized Savage test). Although a selection bias cannot be excluded, survival of patients with a radical resection was longer than that of patients with a nonradical resection (P = 0.0008 and P = 0.0085, respectively). In addition, presence of distant metastasis (P = 0.0006 [Wilcoxon test] and P = 0.033 [Savage test]) could be identified as a prognostic factor. In a Cox regression model, results of surgery and DNA ploidy were independent prognostic variables. Because DNA ploidy has a significant impact on prognosis in pancreatic cancer, it should be used as a variable for stratified randomization of patients in therapeutic trials.
Artificial Intelligence, DNA Mimicry, and Human Health.
Stefano, George B; Kream, Richard M
2017-08-14
The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.
Extraction of High Quality DNA from Seized Moroccan Cannabis Resin (Hashish)
El Alaoui, Moulay Abdelaziz; Melloul, Marouane; Alaoui Amine, Sanaâ; Stambouli, Hamid; El Bouri, Aziz; Soulaymani, Abdelmajid; El Fahime, Elmostafa
2013-01-01
The extraction and purification of nucleic acids is the first step in most molecular biology analysis techniques. The objective of this work is to obtain highly purified nucleic acids derived from Cannabis sativa resin seizure in order to conduct a DNA typing method for the individualization of cannabis resin samples. To obtain highly purified nucleic acids from cannabis resin (Hashish) free from contaminants that cause inhibition of PCR reaction, we have tested two protocols: the CTAB protocol of Wagner and a CTAB protocol described by Somma (2004) adapted for difficult matrix. We obtained high quality genomic DNA from 8 cannabis resin seizures using the adapted protocol. DNA extracted by the Wagner CTAB protocol failed to give polymerase chain reaction (PCR) amplification of tetrahydrocannabinolic acid (THCA) synthase coding gene. However, the extracted DNA by the second protocol permits amplification of THCA synthase coding gene using different sets of primers as assessed by PCR. We describe here for the first time the possibility of DNA extraction from (Hashish) resin derived from Cannabis sativa. This allows the use of DNA molecular tests under special forensic circumstances. PMID:24124454
Conflict Containment in the Balkans: Testing Extended Deterrence.
1995-03-01
STATEMENT 12b. DISTRIBUTION CODE Approved for public release; distribution is unlimited. 13. ABSTRACT This thesis critically analyzes a prominent theoretical...Containment 15. NUMBER OF in the Balkans; Deterrence; Coercive Diplomacy; Balance of Forces. PAGES: 161 16. PRICE CODE 17. SECURITY CLASSIFI- 18. SECURITY...Department of National Security Affai sAccesion For NTIS CRA&I DTtC TAB Unannounced Justifca ........... By- Distribution Availability Codes Avail and/or Dist
The C terminus of Ku80 activates the DNA-dependent protein kinase catalytic subunit.
Singleton, B K; Torres-Arzayus, M I; Rottinghaus, S T; Taccioli, G E; Jeggo, P A
1999-05-01
Ku is a heterodimeric protein with double-stranded DNA end-binding activity that operates in the process of nonhomologous end joining. Ku is thought to target the DNA-dependent protein kinase (DNA-PK) complex to the DNA and, when DNA bound, can interact and activate the DNA-PK catalytic subunit (DNA-PKcs). We have carried out a 3' deletion analysis of Ku80, the larger subunit of Ku, and shown that the C-terminal 178 amino acid residues are dispensable for DNA end-binding activity but are required for efficient interaction of Ku with DNA-PKcs. Cells expressing Ku80 proteins that lack the terminal 178 residues have low DNA-PK activity, are radiation sensitive, and can recombine the signal junctions but not the coding junctions during V(D)J recombination. These cells have therefore acquired the phenotype of mouse SCID cells despite expressing DNA-PKcs protein, suggesting that an interaction between DNA-PKcs and Ku, involving the C-terminal region of Ku80, is required for DNA double-strand break rejoining and coding but not signal joint formation. To gain further insight into important domains in Ku80, we report a point mutational change in Ku80 in the defective xrs-2 cell line. This residue is conserved among species and lies outside of the previously reported Ku70-Ku80 interaction domain. The mutational change nonetheless abrogates the Ku70-Ku80 interaction and DNA end-binding activity.
NASA Astrophysics Data System (ADS)
Bustamam, A.; Aldila, D.; Fatimah, Arimbi, M. D.
2017-07-01
One of the most widely used clustering method, since it has advantage on its robustness, is Self-Organizing Maps (SOM) method. This paper discusses the application of SOM method on Human Papillomavirus (HPV) DNA which is the main cause of cervical cancer disease, the most dangerous cancer in developing countries. We use 18 types of HPV DNA-based on the newest complete genome. By using open-source-based program R, clustering process can separate 18 types of HPV into two different clusters. There are two types of HPV in the first cluster while 16 others in the second cluster. The analyzing result of 18 types HPV based on the malignancy of the virus (the difficultness to cure). Two of HPV types the first cluster can be classified as tame HPV, while 16 others in the second cluster are classified as vicious HPV.
Promoter Sequences Prediction Using Relational Association Rule Mining
Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely
2012-01-01
In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233
Probe classification of on-off type DNA microarray images with a nonlinear matching measure
NASA Astrophysics Data System (ADS)
Ryu, Munho; Kim, Jong Dae; Min, Byoung Goo; Kim, Jongwon; Kim, Y. Y.
2006-01-01
We propose a nonlinear matching measure, called counting measure, as a signal detection measure that is defined as the number of on pixels in the spot area. It is applied to classify probes for an on-off type DNA microarray, where each probe spot is classified as hybridized or not. The counting measure also incorporates the maximum response search method, where the expected signal is obtained by taking the maximum among the measured responses of the various positions and sizes of the spot template. The counting measure was compared to existing signal detection measures such as the normalized covariance and the median for 2390 patient samples tested on the human papillomavirus (HPV) DNA chip. The counting measure performed the best regardless of whether or not the maximum response search method was used. The experimental results showed that the counting measure combined with the positional search was the most preferable.
ERIC Educational Resources Information Center
King, Angela G.
2004-01-01
Nanotechnology are employed by researchers at Northwestern University to develop a method of labeling disease markers present in blood with unique DNA tags they have dubbed "bio-bar-codes". The preparation of nanoparticle and magnetic microparticle probes and a nanoparticle-based PSR-less DNA amplification scheme are involved by the DNA-BCA assay.
Liu, Baodong; Liu, Xiaoling; Lai, Weiyi; Wang, Hailin
2017-06-06
DNA N 6 -methyl-2'-deoxyadenosine (6mdA) is an epigenetic modification in both eukaryotes and bacteria. Here we exploited stable isotope-labeled deoxynucleoside [ 15 N 5 ]-2'-deoxyadenosine ([ 15 N 5 ]-dA) as an initiation tracer and for the first time developed a metabolically differential tracing code for monitoring DNA 6mdA in human cells. We demonstrate that the initiation tracer [ 15 N 5 ]-dA undergoes a specific and efficient adenine deamination reaction leading to the loss the exocyclic amine 15 N, and further utilizes the purine salvage pathway to generate mainly both [ 15 N 4 ]-dA and [ 15 N 4 ]-2'-deoxyguanosine ([ 15 N 4 ]-dG) in mammalian genomes. However, [ 15 N 5 ]-dA is largely retained in the genomes of mycoplasmas, which are often found in cultured cells and experimental animals. Consequently, the methylation of dA generates 6mdA with a consistent coding pattern, with a predominance of [ 15 N 4 ]-6mdA. Therefore, mammalian DNA 6mdA can be potentially discriminated from that generated by infecting mycoplasmas. Collectively, we show a promising approach for identification of authentic DNA 6mdA in human cells and determine if the human cells are contaminated with mycoplasmas.
Living Organisms Author Their Read-Write Genomes in Evolution.
Shapiro, James A
2017-12-06
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Comparison of Danish dichotomous and BI-RADS classifications of mammographic density.
Hodge, Rebecca; Hellmann, Sophie Sell; von Euler-Chelpin, My; Vejborg, Ilse; Andersen, Zorana Jovanovic
2014-06-01
In the Copenhagen mammography screening program from 1991 to 2001, mammographic density was classified either as fatty or mixed/dense. This dichotomous mammographic density classification system is unique internationally, and has not been validated before. To compare the Danish dichotomous mammographic density classification system from 1991 to 2001 with the density BI-RADS classifications, in an attempt to validate the Danish classification system. The study sample consisted of 120 mammograms taken in Copenhagen in 1991-2001, which tested false positive, and which were in 2012 re-assessed and classified according to the BI-RADS classification system. We calculated inter-rater agreement between the Danish dichotomous mammographic classification as fatty or mixed/dense and the four-level BI-RADS classification by the linear weighted Kappa statistic. Of the 120 women, 32 (26.7%) were classified as having fatty and 88 (73.3%) as mixed/dense mammographic density, according to Danish dichotomous classification. According to BI-RADS density classification, 12 (10.0%) women were classified as having predominantly fatty (BI-RADS code 1), 46 (38.3%) as having scattered fibroglandular (BI-RADS code 2), 57 (47.5%) as having heterogeneously dense (BI-RADS 3), and five (4.2%) as having extremely dense (BI-RADS code 4) mammographic density. The inter-rater variability assessed by weighted kappa statistic showed a substantial agreement (0.75). The dichotomous mammographic density classification system utilized in early years of Copenhagen's mammographic screening program (1991-2001) agreed well with the BI-RADS density classification system.
Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes
2007-10-01
reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07
Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes (Postprint)
2007-01-01
reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07
Superimposed Code Theorectic Analysis of DNA Codes and DNA Computing
2010-03-01
because only certain collections (partitioned by font type) of sequences are allowed to be in each position (e.g., Arial = position 0, Comic ...rigidity of short oligos and the shape of the polar charge. Oligo movement was modeled by a Brownian motion 3 dimensional random walk. The one...temperature, kB is Boltz he viscosity of the medium. The random walk motion is modeled by assuming the oligo is on a three dimensional lattice and may
Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer
Junk DNA and the long non-coding RNA twist in cancer genetics
Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A
2015-01-01
The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Public Perceptions and Expectations of the Forensic Use of DNA: Results of a Preliminary Study
ERIC Educational Resources Information Center
Curtis, Cate
2009-01-01
The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and…
He, Shunping; Mayden, Richard L; Wang, Xuzheng; Wang, Wei; Tang, Kevin L; Chen, Wei-Jen; Chen, Yiyu
2008-03-01
The family Cyprinidae is the largest freshwater fish group in the world, including over 200 genera and 2100 species. The phylogenetic relationships of major clades within this family are simply poorly understood, largely because of the overwhelming diversity of the group; however, several investigators have advanced different hypotheses of relationships that pre- and post-date the use of shared-derived characters as advocated through phylogenetic systematics. As expected, most previous investigations used morphological characters. Recently, mitochondrial DNA (mtDNA) sequences and combined morphological and mtDNA investigations have been used to explore and advance our understanding of species relationships and test monophyletic groupings. Limitations of these studies include limited taxon sampling and a strict reliance upon maternally inherited mtDNA variation. The present study is the first endeavor to recover the phylogenetic relationships of the 12 previously recognized monophyletic subfamilies within the Cyprinidae using newly sequenced nuclear DNA (nDNA) for over 50 species representing members of the different previously hypothesized subfamily and family groupings within the Cyprinidae and from other cypriniform families as outgroup taxa. Hypothesized phylogenetic relationships are constructed using maximum parsimony and Basyesian analyses of 1042 sites, of which 971 sites were variable and 790 were phylogenetically informative. Using other appropriate cypriniform taxa of the families Catostomidae (Myxocyprinus asiaticus), Gyrinocheilidae (Gyrinocheilus aymonieri), and Balitoridae (Nemacheilus sp. and Beaufortia kweichowensis) as outgroups, the Cyprinidae is resolved as a monophyletic group. Within the family the genera Raiamas, Barilius, Danio, and Rasbora, representing many of the tropical cyprinids, represent basal members of the family. All other species can be classified into variably supported and resolved monophyletic lineages, depending upon analysis, that are consistent with or correspond to Barbini and Leuciscini. The Barbini includes taxa traditionally aligned with the subfamily Cyprininae sensu previous morphological revisionary studies by Howes (Barbinae, Labeoninae, Cyprininae and Schizothoracinae). The Leuciscini includes six other subfamilies that are mainly divided into three separate lineages. The relationships among genera and subfamilies are discussed as well as the possible origins of major lineages.
EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms.
Rapakoulia, Trisevgeni; Theofilatos, Konstantinos; Kleftogiannis, Dimitrios; Likothanasis, Spiros; Tsakalidis, Athanasios; Mavroudi, Seferina
2014-08-15
Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http://prlab.ceid.upatras.gr/EnsembleGASVR/site.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Warren, Michael D.
1997-01-01
Explains a method to enable students to understand DNA and protein synthesis using model-building and role-playing. Acquaints students with the triplet code and transcription. Includes copies of the charts used in this technique. (DDR)
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia
2018-01-04
Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Perspective on DNA Microarrays in Pathology Research and Practice
Pollack, Jonathan R.
2007-01-01
DNA microarray technology matured in the mid-1990s, and the past decade has witnessed a tremendous growth in its application. DNA microarrays have provided powerful tools for pathology researchers seeking to describe, classify, and understand human disease. There has also been great expectation that the technology would advance the practice of pathology. This review highlights some of the key contributions of DNA microarrays to experimental pathology, focusing in the area of cancer research. Also discussed are some of the current challenges in translating utility to clinical practice. PMID:17600117
Sommariva, Michele; De Cecco, Loris; De Cesare, Michelandrea; Sfondrini, Lucia; Ménard, Sylvie; Melani, Cecilia; Delia, Domenico; Zaffaroni, Nadia; Pratesi, Graziella; Uva, Valentina; Tagliabue, Elda; Balsari, Andrea
2011-10-15
Synthetic oligodeoxynucleotides expressing CpG motifs (CpG-ODN) are a Toll-like receptor 9 (TLR9) agonist that can enhance the antitumor activity of DNA-damaging chemotherapy and radiation therapy in preclinical mouse models. We hypothesized that the success of these combinations is related to the ability of CpG-ODN to modulate genes involved in DNA repair. We conducted an in silico analysis of genes implicated in DNA repair in data sets obtained from murine colon carcinoma cells in mice injected intratumorally with CpG-ODN and from splenocytes in mice treated intraperitoneally with CpG-ODN. CpG-ODN treatment caused downregulation of DNA repair genes in tumors. Microarray analyses of human IGROV-1 ovarian carcinoma xenografts in mice treated intraperitoneally with CpG-ODN confirmed in silico findings. When combined with the DNA-damaging drug cisplatin, CpG-ODN significantly increased the life span of mice compared with individual treatments. In contrast, CpG-ODN led to an upregulation of genes involved in DNA repair in immune cells. Cisplatin-treated patients with ovarian carcinoma as well as anthracycline-treated patients with breast cancer who are classified as "CpG-like" for the level of expression of CpG-ODN modulated DNA repair genes have a better outcome than patients classified as "CpG-untreated-like," indicating the relevance of these genes in the tumor cell response to DNA-damaging drugs. Taken together, the findings provide evidence that the tumor microenvironment can sensitize cancer cells to DNA-damaging chemotherapy, thereby expanding the benefits of CpG-ODN therapy beyond induction of a strong immune response.
Phylogeographic Differentiation of Mitochondrial DNA in Han Chinese
Yao, Yong-Gang; Kong, Qing-Peng; Bandelt, Hans-Jürgen; Kivisild, Toomas; Zhang, Ya-Ping
2002-01-01
To characterize the mitochondrial DNA (mtDNA) variation in Han Chinese from several provinces of China, we have sequenced the two hypervariable segments of the control region and the segment spanning nucleotide positions 10171–10659 of the coding region, and we have identified a number of specific coding-region mutations by direct sequencing or restriction-fragment–length–polymorphism tests. This allows us to define new haplogroups (clades of the mtDNA phylogeny) and to dissect the Han mtDNA pool on a phylogenetic basis, which is a prerequisite for any fine-grained phylogeographic analysis, the interpretation of ancient mtDNA, or future complete mtDNA sequencing efforts. Some of the haplogroups under study differ considerably in frequencies across different provinces. The southernmost provinces show more pronounced contrasts in their regional Han mtDNA pools than the central and northern provinces. These and other features of the geographical distribution of the mtDNA haplogroups observed in the Han Chinese make an initial Paleolithic colonization from south to north plausible but would suggest subsequent migration events in China that mainly proceeded from north to south and east to west. Lumping together all regional Han mtDNA pools into one fictive general mtDNA pool or choosing one or two regional Han populations to represent all Han Chinese is inappropriate for prehistoric considerations as well as for forensic purposes or medical disease studies. PMID:11836649
Investigation of a Sybr-Green-Based Method to Validate DNA Sequences for DNA Computing
2005-05-01
OF A SYBR-GREEN-BASED METHOD TO VALIDATE DNA SEQUENCES FOR DNA COMPUTING 6. AUTHOR(S) Wendy Pogozelski, Salvatore Priore, Matthew Bernard ...simulated annealing. Biochemistry, 35, 14077-14089. 15 Pogozelski, W.K., Bernard , M.P. and Macula, A. (2004) DNA code validation using...and Clark, B.F.C. (eds) In RNA Biochemistry and Biotechnology, NATO ASI Series, Kluwer Academic Publishers. Zucker, M. and Stiegler , P. (1981
Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C
2015-04-01
The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helfenbein, Kevin G.; Brown, Wesley M.; Boore, Jeffrey L.
We have sequenced the complete mitochondrial DNA (mtDNA) of the articulate brachiopod Terebratalia transversa. The circular genome is 14,291 bp in size, relatively small compared to other published metazoan mtDNAs. The 37 genes commonly found in animal mtDNA are present; the size decrease is due to the truncation of several tRNA, rRNA, and protein genes, to some nucleotide overlaps, and to a paucity of non-coding nucleotides. Although the gene arrangement differs radically from those reported for other metazoans, some gene junctions are shared with two other articulate brachiopods, Laqueus rubellus and Terebratulina retusa. All genes in the T. transversa mtDNA,more » unlike those in most metazoan mtDNAs reported, are encoded by the same strand. The A+T content (59.1 percent) is low for a metazoan mtDNA, and there is a high propensity for homopolymer runs and a strong base-compositional strand bias. The coding strand is quite G+T-rich, a skew that is shared by the confamilial (laqueid) specie s L. rubellus, but opposite to that found in T. retusa, a cancellothyridid. These compositional skews are strongly reflected in the codon usage patterns and the amino acid compositions of the mitochondrial proteins, with markedly different usage observed between T. retusa and the two laqueids. This observation, plus the similarity of the laqueid non-coding regions to the reverse complement of the non-coding region of the cancellothyridid, suggest that an inversion that resulted in a reversal in the direction of first-strand replication has occurred in one of the two lineages. In addition to the presence of one non-coding region in T. transversa that is comparable to those in the other brachiopod mtDNAs, there are two others with the potential to form secondary structures; one or both of these may be involved in the process of transcript cleavage.« less
2015-01-01
Conformational polymorphism of DNA is a major causative factor behind several incurable trinucleotide repeat expansion disorders that arise from overexpansion of trinucleotide repeats located in coding/non-coding regions of specific genes. Hairpin DNA structures that are formed due to overexpansion of CAG repeat lead to Huntington’s disorder and spinocerebellar ataxias. Nonetheless, DNA hairpin stem structure that generally embraces B-form with canonical base pairs is poorly understood in the context of periodic noncanonical A…A mismatch as found in CAG repeat overexpansion. Molecular dynamics simulations on DNA hairpin stems containing A…A mismatches in a CAG repeat overexpansion show that A…A dictates local Z-form irrespective of starting glycosyl conformation, in sharp contrast to canonical DNA duplex. Transition from B-to-Z is due to the mechanistic effect that originates from its pronounced nonisostericity with flanking canonical base pairs facilitated by base extrusion, backbone and/or base flipping. Based on these structural insights we envisage that such an unusual DNA structure of the CAG hairpin stem may have a role in disease pathogenesis. As this is the first study that delineates the influence of a single A…A mismatch in reversing DNA helicity, it would further have an impact on understanding DNA mismatch repair. PMID:25876062
A Tandemly Arranged Pattern of Two 5S rDNA Arrays in Amolops mantzorum (Anura, Ranidae).
Liu, Ting; Song, Menghuan; Xia, Yun; Zeng, Xiaomao
2017-01-01
In an attempt to extend the knowledge of the 5S rDNA organization in anurans, the 5S rDNA sequences of Amolops mantzorum were isolated, characterized, and mapped by FISH. Two forms of 5S rDNA, type I (209 bp) and type II (about 870 bp), were found in specimens investigated from various populations. Both of them contained a 118-bp coding sequence, readily differentiated by their non-transcribed spacer (NTS) sizes and compositions. Four probes (the 5S rDNA coding sequences, the type I NTS, the type II NTS, and the entire type II 5S rDNA sequences) were respectively labeled with TAMRA or digoxigenin to hybridize with mitotic chromosomes for samples of all localities. It turned out that all probes showed the same signals that appeared in every centromeric region and in the telomeric regions of chromosome 5, without differences within or between populations. Obviously, both type I and type II of the 5S rDNA arrays arranged in tandem, which was contrasting with other frogs or fishes recorded to date. More interestingly, all the probes detected centromeric regions in all karyotypes, suggesting the presence of a satellite DNA family derived from 5S rDNA. © 2017 S. Karger AG, Basel.
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
Silencing of the pentose phosphate pathway genes influences DNA replication in human fibroblasts.
Fornalewicz, Karolina; Wieczorek, Aneta; Węgrzyn, Grzegorz; Łyżeń, Robert
2017-11-30
Previous reports and our recently published data indicated that some enzymes of glycolysis and the tricarboxylic acid cycle can affect the genome replication process by changing either the efficiency or timing of DNA synthesis in human normal cells. Both these pathways are connected with the pentose phosphate pathway (PPP pathway). The PPP pathway supports cell growth by generating energy and precursors for nucleotides and amino acids. Therefore, we asked if silencing of genes coding for enzymes involved in the pentose phosphate pathway may also affect the control of DNA replication in human fibroblasts. Particular genes coding for PPP pathway enzymes were partially silenced with specific siRNAs. Such cells remained viable. We found that silencing of the H6PD, PRPS1, RPE genes caused less efficient enterance to the S phase and decrease in efficiency of DNA synthesis. On the other hand, in cells treated with siRNA against G6PD, RBKS and TALDO genes, the fraction of cells entering the S phase was increased. However, only in the case of G6PD and TALDO, the ratio of BrdU incorporation to DNA was significantly changed. The presented results together with our previously published studies illustrate the complexity of the influence of genes coding for central carbon metabolism on the control of DNA replication in human fibroblasts, and indicate which of them are especially important in this process. Copyright © 2017 Elsevier B.V. All rights reserved.
PlantTribes: a gene and gene family resource for comparative genomics in plants
Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.
2008-01-01
The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study. PMID:18073194
Marzo, Mar; Puig, Marta; Ruiz, Alfredo
2008-02-26
Galileo is the only transposable element (TE) known to have generated natural chromosomal inversions in the genus Drosophila. It was discovered in Drosophila buzzatii and classified as a Foldback-like element because of its long, internally repetitive, terminal inverted repeats (TIRs) and lack of coding capacity. Here, we characterized a seemingly complete copy of Galileo from the D. buzzatii genome. It is 5,406 bp long, possesses 1,229-bp TIRs, and encodes a 912-aa transposase similar to those of the Drosophila melanogaster 1360 (Hoppel) and P elements. We also searched the recently available genome sequences of 12 Drosophila species for elements similar to Dbuz\\Galileo by using bioinformatic tools. Galileo was found in six species (ananassae, willistoni, peudoobscura, persimilis, virilis, and mojavensis) from the two main lineages within the Drosophila genus. Our observations place Galileo within the P superfamily of cut-and-paste transposons and extend considerably its phylogenetic distribution. The interspecific distribution of Galileo indicates an ancient presence in the genus, but the phylogenetic tree built with the transposase amino acid sequences contrasts significantly with that of the species, indicating lineage sorting and/or horizontal transfer events. Our results also suggest that Foldback-like elements such as Galileo may evolve from DNA-based transposon ancestors by loss of the transposase gene and disproportionate elongation of TIRs.
Marzo, Mar; Puig, Marta; Ruiz, Alfredo
2008-01-01
Galileo is the only transposable element (TE) known to have generated natural chromosomal inversions in the genus Drosophila. It was discovered in Drosophila buzzatii and classified as a Foldback-like element because of its long, internally repetitive, terminal inverted repeats (TIRs) and lack of coding capacity. Here, we characterized a seemingly complete copy of Galileo from the D. buzzatii genome. It is 5,406 bp long, possesses 1,229-bp TIRs, and encodes a 912-aa transposase similar to those of the Drosophila melanogaster 1360 (Hoppel) and P elements. We also searched the recently available genome sequences of 12 Drosophila species for elements similar to Dbuz\\Galileo by using bioinformatic tools. Galileo was found in six species (ananassae, willistoni, peudoobscura, persimilis, virilis, and mojavensis) from the two main lineages within the Drosophila genus. Our observations place Galileo within the P superfamily of cut-and-paste transposons and extend considerably its phylogenetic distribution. The interspecific distribution of Galileo indicates an ancient presence in the genus, but the phylogenetic tree built with the transposase amino acid sequences contrasts significantly with that of the species, indicating lineage sorting and/or horizontal transfer events. Our results also suggest that Foldback-like elements such as Galileo may evolve from DNA-based transposon ancestors by loss of the transposase gene and disproportionate elongation of TIRs. PMID:18287066
Superimposed Code Theoretic Analysis of Deoxyribonucleic Acid (DNA) Codes and DNA Computing
2010-01-01
partitioned by font type) of sequences are allowed to be in each position (e.g., Arial = position 0, Comic = position 1, etc. ) and within each collection...movement was modeled by a Brownian motion 3 dimensional random walk. The one dimensional diffusion coefficient D for the ellipsoid shape with 3...temperature, kB is Boltzmann’s constant, and η is the viscosity of the medium. The random walk motion is modeled by assuming the oligo is on a three
Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer cells. Read more...
Rare ATAD5 missense variants in breast and ovarian cancer patients.
Maleva Kostovska, Ivana; Wang, Jing; Bogdanova, Natalia; Schürmann, Peter; Bhuju, Sabin; Geffers, Robert; Dürst, Matthias; Liebrich, Clemens; Klapdor, Rüdiger; Christiansen, Hans; Park-Simon, Tjoung-Won; Hillemanns, Peter; Plaseska-Karanfilska, Dijana; Dörk, Thilo
2016-06-28
ATAD5/ELG1 is a protein crucially involved in replication and maintenance of genome stability. ATAD5 has recently been identified as a genomic risk locus for both breast and ovarian cancer through genome-wide association studies. We aimed to investigate the spectrum of coding ATAD5 germ-line mutations in hospital-based series of patients with triple-negative breast cancer or serous ovarian cancer compared with healthy controls. The ATAD5 coding and adjacent splice site regions were analyzed by targeted next-generation sequencing of DNA samples from 273 cancer patients, including 114 patients with triple-negative breast cancer and 159 patients with serous epithelial ovarian cancer, and from 276 healthy females. Among 42 different variants identified, twenty-two were rare missense substitutions, of which 14 were classified as pathogenic by at least one in silico prediction tool. Three of four novel missense substitutions (p.S354I, p.H974R and p.K1466N) were predicted to be pathogenic and were all identified in ovarian cancer patients. Overall, rare missense variants with predicted pathogenicity tended to be enriched in ovarian cancer patients (14/159) versus controls (11/276) (p = 0.05, 2df). While truncating germ-line variants in ATAD5 were not detected, it remains possible that several rare missense variants contribute to genetic susceptibility toward epithelial ovarian carcinomas. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.
Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan
2014-07-18
Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.
Goremykin, Vadim V; Lockhart, Peter J; Viola, Roberto; Velasco, Riccardo
2012-08-01
Mitochondrial genomes of spermatophytes are the largest of all organellar genomes. Their large size has been attributed to various factors; however, the relative contribution of these factors to mitochondrial DNA (mtDNA) expansion remains undetermined. We estimated their relative contribution in Malus domestica (apple). The mitochondrial genome of apple has a size of 396 947 bp and a one to nine ratio of coding to non-coding DNA, close to the corresponding average values for angiosperms. We determined that 71.5% of the apple mtDNA sequence was highly similar to sequences of its nuclear DNA. Using nuclear gene exons, nuclear transposable elements and chloroplast DNA as markers of promiscuous DNA content in mtDNA, we estimated that approximately 20% of the apple mtDNA consisted of DNA sequences imported from other cell compartments, mostly from the nucleus. Similar marker-based estimates of promiscuous DNA content in the mitochondrial genomes of other species ranged between 21.2 and 25.3% of the total mtDNA length for grape, between 23.1 and 38.6% for rice, and between 47.1 and 78.4% for maize. All these estimates are conservative, because they underestimate the import of non-functional DNA. We propose that the import of promiscuous DNA is a core mechanism for mtDNA size expansion in seed plants. In apple, maize and grape this mechanism contributed far more to genome expansion than did homologous recombination. In rice the estimated contribution of both mechanisms was found to be similar. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.
Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei
2005-08-15
Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
1988-10-10
identify by block number) FIELD GROUP S OUP - Archaebacteria , Halobacteria, Proteins Nucleic Acids, 08 RNA Polymerase-DNA Interactionsi R soimal operons...objectives of our program are to isolate and characterize a fully active DNA dependent RNA polymerase from the extremely halophilic archaebacteria from...Woese and his colleagues to suggest that all living organisms can be classified into three phylogenetic kingdoms : the eukaryotes, the eubacterla and
Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji
2015-12-01
Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
Multimodal biometric digital watermarking on immigrant visas for homeland security
NASA Astrophysics Data System (ADS)
Sasi, Sreela; Tamhane, Kirti C.; Rajappa, Mahesh B.
2004-08-01
Passengers with immigrant Visa's are a major concern to the International Airports due to the various fraud operations identified. To curb tampering of genuine Visa, the Visa's should contain human identification information. Biometric characteristic is a common and reliable way to authenticate the identity of an individual [1]. A Multimodal Biometric Human Identification System (MBHIS) that integrates iris code, DNA fingerprint, and the passport number on the Visa photograph using digital watermarking scheme is presented. Digital Watermarking technique is well suited for any system requiring high security [2]. Ophthalmologists [3], [4], [5] suggested that iris scan is an accurate and nonintrusive optical fingerprint. DNA sequence can be used as a genetic barcode [6], [7]. While issuing Visa at the US consulates, the DNA sequence isolated from saliva, the iris code and passport number shall be digitally watermarked in the Visa photograph. This information is also recorded in the 'immigrant database'. A 'forward watermarking phase' combines a 2-D DWT transformed digital photograph with the personal identification information. A 'detection phase' extracts the watermarked information from this VISA photograph at the port of entry, from which iris code can be used for identification and DNA biometric for authentication, if an anomaly arises.
Sequence Polishing Library (SPL) v10.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oberortner, Ernst
The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho
2017-01-01
Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Wills, Peter R
2016-03-13
This article reviews contributions to this theme issue covering the topic 'DNA as information' in relation to the structure of DNA, the measure of its information content, the role and meaning of information in biology and the origin of genetic coding as a transition from uninformed to meaningful computational processes in physical systems. © 2016 The Author(s).
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2010 CFR
2010-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2011 CFR
2011-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2012 CFR
2012-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2014 CFR
2014-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
28 CFR 812.3 - Coordination with the Federal Bureau of Prisons.
Code of Federal Regulations, 2013 CFR
2013-07-01
... THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION § 812.3 Coordination with the Federal... documentation regarding the collection of a DNA sample when the Federal Bureau of Prisons releases an inmate to... documentation regarding the collection of a DNA sample from a District of Columbia Code offender when CSOSA...
NASA Astrophysics Data System (ADS)
Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M. R.; Cebrat, S.
1999-12-01
We have analysed protein coding and intergenic sequences in the Borrelia burgdorferi (the Lyme disease bacterium) genome using different kinds of DNA walks. Genes occupying the leading strand of DNA have significantly different nucleotide composition from genes occupying the lagging strand. Nucleotide compositional bias of the two DNA strands reflects the aminoacid composition of proteins. 96% of genes coding for ribosomal proteins lie on the leading DNA strand, which suggests that the positions of these as well as other genes are non-random. In the B. burgdorferi genome, the asymmetry in intergenic DNA sequences is lower than the asymmetry in the third positions in codons. All these characters of the B. burgdorferi genome suggest that both replication-associated mutational pressure and recombination mechanisms have established the specific structure of the genome and now any recombination leading to inversion of a gene in respect to the direction of replication is forbidden. This property of the genome allows us to assume that it is in a steady state, which enables us to fix some parameters for simulations of DNA evolution.
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.
Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano
2018-01-01
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.
Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li
2007-06-01
The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit
NASA Astrophysics Data System (ADS)
Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe
1986-10-01
The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.
NASA Astrophysics Data System (ADS)
Sun, Yankui; Li, Shan; Sun, Zhongyang
2017-01-01
We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects-15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing-168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.
Sun, Yankui; Li, Shan; Sun, Zhongyang
2017-01-01
We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects—15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing—168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.
Suárez, Martha Y.; Villagrán; Miller, John H.
2015-01-01
We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease ‘driver’ mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands. PMID:26310834
Villagrán, Martha Y Suárez; Miller, John H
2015-08-27
We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease 'driver' mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.
Morchikh, Mehdi; Cribier, Alexandra; Raffel, Raoul; Amraoui, Sonia; Cau, Julien; Severac, Dany; Dubois, Emeric; Schwartz, Olivier; Bennasser, Yamina; Benkirane, Monsef
2017-08-03
The DNA-mediated innate immune response underpins anti-microbial defenses and certain autoimmune diseases. Here we used immunoprecipitation, mass spectrometry, and RNA sequencing to identify a ribonuclear complex built around HEXIM1 and the long non-coding RNA NEAT1 that we dubbed the HEXIM1-DNA-PK-paraspeckle components-ribonucleoprotein complex (HDP-RNP). The HDP-RNP contains DNA-PK subunits (DNAPKc, Ku70, and Ku80) and paraspeckle proteins (SFPQ, NONO, PSPC1, RBM14, and MATRIN3). We show that binding of HEXIM1 to NEAT1 is required for its assembly. We further demonstrate that the HDP-RNP is required for the innate immune response to foreign DNA, through the cGAS-STING-IRF3 pathway. The HDP-RNP interacts with cGAS and its partner PQBP1, and their interaction is remodeled by foreign DNA. Remodeling leads to the release of paraspeckle proteins, recruitment of STING, and activation of DNAPKc and IRF3. Our study establishes the HDP-RNP as a key nuclear regulator of DNA-mediated activation of innate immune response through the cGAS-STING pathway. Copyright © 2017 Elsevier Inc. All rights reserved.
Does CTCF mediate between nuclear organization and gene expression?
Ohlsson, Rolf; Lobanenkov, Victor; Klenova, Elena
2010-01-01
The multifunctional zinc-finger protein CCCTC-binding factor (CTCF) is a very strong candidate for the role of coordinating the expression level of coding sequences with their three-dimensional position in the nucleus, apparently responding to a "code" in the DNA itself. Dynamic interactions between chromatin fibers in the context of nuclear architecture have been implicated in various aspects of genome functions. However, the molecular basis of these interactions still remains elusive and is a subject of intense debate. Here we discuss the nature of CTCF-DNA interactions, the CTCF-binding specificity to its binding sites and the relationship between CTCF and chromatin, and we examine data linking CTCF with gene regulation in the three-dimensional nuclear space. We discuss why these features render CTCF a very strong candidate for the role and propose a unifying model, the "CTCF code," explaining the mechanistic basis of how the information encrypted in DNA may be interpreted by CTCF into diverse nuclear functions.
Olmo, Francisco; Escobedo-Orteg, Javier; Palma, Patricia; Sánchez-Moreno, Manuel; Mejía-Jaramillo, Ana; Triana, Omar; Marín, Clotilde
2014-11-01
To classify 21 new isolates of Trypanosoma cruzi (T. cruzi) according to the Discrete Typing Unit (DTU) which they belong to, as well as tune up a new pair of primers designed to detect the parasite in biological samples. Strains were isolated, DNA extracted, and classified by using three Polymerase Chain Reactions (PCR). Subsequently this DNA was used along with other isolates of various biological samples, for a new PCR using primers designed. Finally, the amplified fragments were sequenced. It was observed the predominance of DTU I in Colombia, as well as the specificity of our primers for detection of T. cruzi, while no band was obtained when other species were used. This work reveals the genetic variability of 21 new isolates of T. cruzi in Colombia.Our primers confirmed their specificity for detecting the presence of T. cruzi. Copyright © 2014 Hainan Medical College. Published by Elsevier B.V. All rights reserved.
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Herrnstadt, Corinna; Elson, Joanna L; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M; Anderson, Christen; Ghosh, Soumitra S; Olefsky, Jerrold M; Beal, M Flint; Davis, Robert E; Howell, Neil
2002-05-01
The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Implementation of new physics models for low energy electrons in liquid water in Geant4-DNA.
Bordage, M C; Bordes, J; Edel, S; Terrissol, M; Franceries, X; Bardiès, M; Lampe, N; Incerti, S
2016-12-01
A new alternative set of elastic and inelastic cross sections has been added to the very low energy extension of the Geant4 Monte Carlo simulation toolkit, Geant4-DNA, for the simulation of electron interactions in liquid water. These cross sections have been obtained from the CPA100 Monte Carlo track structure code, which has been a reference in the microdosimetry community for many years. They are compared to the default Geant4-DNA cross sections and show better agreement with published data. In order to verify the correct implementation of the CPA100 cross section models in Geant4-DNA, simulations of the number of interactions and ranges were performed using Geant4-DNA with this new set of models, and the results were compared with corresponding results from the original CPA100 code. Good agreement is observed between the implementations, with relative differences lower than 1% regardless of the incident electron energy. Useful quantities related to the deposited energy at the scale of the cell or the organ of interest for internal dosimetry, like dose point kernels, are also calculated using these new physics models. They are compared with results obtained using the well-known Penelope Monte Carlo code. Copyright © 2016 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud
2000-01-01
Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
DeWitt, D L; Smith, W L
1988-01-01
Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548
15 CFR 740.17 - Encryption commodities, software and technology (ENC).
Code of Federal Regulations, 2011 CFR
2011-01-01
... classified under ECCN 5B002, and equivalent or related software and technology classified under ECCNs 5D002... service in any country listed in Country Group E:1 in supplement No. 1 to part 740 of the EAR, or release of source code or technology to any national of a country listed in Country Group E:1. Reexports and...
Automated Classification of Power Signals
2008-06-01
determine when a transient occurs. The identification of this signal can then be determined by an expert classifier and a series of these...the manual identification and classification of system events. Once events were located, the characteristics were examined to determine if system... identification code, which varies depending on the system classifier that is specified. Figure 3-7 provides an example of a Linux directory containing
Effect of separate sampling on classification accuracy.
Shahrokh Esfahani, Mohammad; Dougherty, Edward R
2014-01-15
Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples. We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier. All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.
NASA Technical Reports Server (NTRS)
Plante, I; Wu, H
2014-01-01
The code RITRACKS (Relativistic Ion Tracks) has been developed over the last few years at the NASA Johnson Space Center to simulate the effects of ionizing radiations at the microscopic scale, to understand the effects of space radiation at the biological level. The fundamental part of this code is the stochastic simulation of radiation track structure of heavy ions, an important component of space radiations. The code can calculate many relevant quantities such as the radial dose, voxel dose, and may also be used to calculate the dose in spherical and cylindrical targets of various sizes. Recently, we have incorporated DNA structure and damage simulations at the molecular scale in RITRACKS. The direct effect of radiations is simulated by introducing a slight modification of the existing particle transport algorithms, using the Binary-Encounter-Bethe model of ionization cross sections for each molecular orbitals of DNA. The simulation of radiation chemistry is done by a step-by-step diffusion-reaction program based on the Green's functions of the diffusion equation]. This approach is also used to simulate the indirect effect of ionizing radiation on DNA. The software can be installed independently on PC and tablets using the Windows operating system and does not require any coding from the user. It includes a Graphic User Interface (GUI) and a 3D OpenGL visualization interface. The calculations are executed simultaneously (in parallel) on multiple CPUs. The main features of the software will be presented.
Wang, Wei; Liu, Juan; Sun, Lin
2016-07-01
Protein-DNA bindings are critical to many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. Here, we analyzed the residues shape (peak, flat, or valley) and the surrounding environment of double-stranded DNA-binding proteins (DSBs) and single-stranded DNA-binding proteins (SSBs) in protein-DNA interfaces. In the results, we found that the interface shapes, hydrogen bonds, and the surrounding environment present significant differences between the two kinds of proteins. Built on the investigation results, we constructed a random forest (RF) classifier to distinguish DSBs and SSBs with satisfying performance. In conclusion, we present a novel methodology to characterize protein interfaces, which will deepen our understanding of the specificity of proteins binding to ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA). Proteins 2016; 84:979-989. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Smurf2 Regulates DNA Repair and Packaging to Prevent Tumors | Center for Cancer Research
The blueprint for all of a cell’s functions is written in the genetic code of DNA sequences as well as in the landscape of DNA and histone modifications. DNA is wrapped around histones to package it into chromatin, which is stored in the nucleus. It is important to maintain the integrity of the chromatin structure to ensure that the cell continues to behave appropriately.
New Modeling Approaches to Study DNA Damage by the Direct and Indirect Effects of Ionizing Radiation
NASA Technical Reports Server (NTRS)
Plante, Ianik; Cucinotta, Francis A.
2012-01-01
DNA is damaged both by the direct and indirect effects of radiation. In the direct effect, the DNA itself is ionized, whereas the indirect effect involves the radiolysis of the water molecules surrounding the DNA and the subsequent reaction of the DNA with radical products. While this problem has been studied for many years, many unknowns still exist. To study this problem, we have developed the computer code RITRACKS [1], which simulates the radiation track structure for heavy ions and electrons, calculating all energy deposition events and the coordinates of all species produced by the water radiolysis. In this work, we plan to simulate DNA damage by using the crystal structure of a nucleosome and calculations performed by RITRACKS. The energy deposition events are used to calculate the dose deposited in nanovolumes [2] and therefore can be used to simulate the direct effect of the radiation. Using the positions of the radiolytic species with a radiation chemistry code [3] it will be possible to simulate DNA damage by indirect effect. The simulation results can be compared with results from previous calculations such as the frequencies of simple and complex strand breaks [4] and with newer experimental data using surrogate markers of DNA double ]strand breaks such as . ]H2AX foci [5].
Shao, Wei; Liu, Mingxia; Zhang, Daoqiang
2016-01-01
The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. The dataset and code can be downloaded from https://github.com/shaoweinuaa/. dqzhang@nuaa.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A sampling bias in identifying children in foster care using Medicaid data.
Rubin, David M; Pati, Susmita; Luan, Xianqun; Alessandrini, Evaline A
2005-01-01
Prior research identified foster care children using Medicaid eligibility codes specific to foster care, but it is unknown whether these codes capture all foster care children. To describe the sampling bias in relying on Medicaid eligibility codes to identify foster care children. Using foster care administrative files linked to Medicaid data, we describe the proportion of children whose Medicaid eligibility was correctly encoded as foster child during a 1-year follow-up period following a new episode of foster care. Sampling bias is described by comparing claims in mental health, emergency department (ED), and other ambulatory settings among correctly and incorrectly classified foster care children. Twenty-eight percent of the 5683 sampled children were incorrectly classified in Medicaid eligibility files. In a multivariate logistic regression model, correct classification was associated with duration of foster care (>9 vs <2 months, odds ratio [OR] 7.67, 95% confidence interval [CI] 7.17-7.97), number of placements (>3 vs 1 placement, OR 4.20, 95% CI 3.14-5.64), and placement in a group home among adjudicated dependent children (OR 1.87, 95% CI 1.33-2.63). Compared with incorrectly classified children, correctly classified foster care children were 3 times more likely to use any services, 2 times more likely to visit the ED, 3 times more likely to make ambulatory visits, and 4 times more likely to use mental health care services (P < .001 for all comparisons). Identifying children in foster care using Medicaid eligibility files is prone to sampling bias that over-represents children in foster care who use more services.
Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.
2010-01-01
The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284
The audiovisual structure of onomatopoeias: An intrusion of real-world physics in lexical creation.
Taitz, Alan; Assaneo, M Florencia; Elisei, Natalia; Trípodi, Mónica; Cohen, Laurent; Sitt, Jacobo D; Trevisan, Marcos A
2018-01-01
Sound-symbolic word classes are found in different cultures and languages worldwide. These words are continuously produced to code complex information about events. Here we explore the capacity of creative language to transport complex multisensory information in a controlled experiment, where our participants improvised onomatopoeias from noisy moving objects in audio, visual and audiovisual formats. We found that consonants communicate movement types (slide, hit or ring) mainly through the manner of articulation in the vocal tract. Vowels communicate shapes in visual stimuli (spiky or rounded) and sound frequencies in auditory stimuli through the configuration of the lips and tongue. A machine learning model was trained to classify movement types and used to validate generalizations of our results across formats. We implemented the classifier with a list of cross-linguistic onomatopoeias simple actions were correctly classified, while different aspects were selected to build onomatopoeias of complex actions. These results show how the different aspects of complex sensory information are coded and how they interact in the creation of novel onomatopoeias.
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G
1995-01-01
The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
Gemini surfactants mediate efficient mitochondrial gene delivery and expression.
Cardoso, Ana M; Morais, Catarina M; Cruz, A Rita; Cardoso, Ana L; Silva, Sandra G; do Vale, M Luísa; Marques, Eduardo F; Pedroso de Lima, Maria C; Jurado, Amália S
2015-03-02
Gene delivery targeting mitochondria has the potential to transform the therapeutic landscape of mitochondrial genetic diseases. Taking advantage of the nonuniversal genetic code used by mitochondria, a plasmid DNA construct able to be specifically expressed in these organelles was designed by including a codon, which codes for an amino acid only if read by the mitochondrial ribosomes. In the present work, gemini surfactants were shown to successfully deliver plasmid DNA to mitochondria. Gemini surfactant-based DNA complexes were taken up by cells through a variety of routes, including endocytic pathways, and showed propensity for inducing membrane destabilization under acidic conditions, thus facilitating cytoplasmic release of DNA. Furthermore, the complexes interacted extensively with lipid membrane models mimicking the composition of the mitochondrial membrane, which predicts a favored interaction of the complexes with mitochondria in the intracellular environment. This work unravels new possibilities for gene therapy toward mitochondrial diseases.
Leber Hereditary Optic Neuropathy: Exemplar of an mtDNA Disease.
Wallace, Douglas C; Lott, Marie T
2017-01-01
The report in 1988 that Leber Hereditary Optic Neuropathy (LHON) was the product of mitochondrial DNA (mtDNA) mutations provided the first demonstration of the clinical relevance of inherited mtDNA variation. From LHON studies, the medical importance was demonstrated for the mtDNA showing its coding for the most important energy genes, its maternal inheritance, its high mutation rate, its presence in hundreds to thousands of copies per cell, its quantitatively segregation of biallelic genotypes during both mitosis and meiosis, its preferential effect on the most energetic tissues including the eye and brain, its wide range of functional polymorphisms that predispose to common diseases, and its accumulation of mutations within somatic tissues providing the aging clock. These features of mtDNA genetics, in combination with the genetics of the 1-2000 nuclear DNA (nDNA) coded mitochondrial genes, is not only explaining the genetics of LHON but also providing a model for understanding the complexity of many common diseases. With the maturation of LHON biology and genetics, novel animal models for complex disease have been developed and new therapeutic targets and strategies envisioned, both pharmacological and genetic. Multiple somatic gene therapy approaches are being developed for LHON which are applicable to other mtDNA diseases. Moreover, the unique cytoplasmic genetics of the mtDNA has permitted the first successful human germline gene therapy via spindle nDNA transfer from mtDNA mutant oocytes to enucleated normal mtDNA oocytes. Such LHON lessons are actively being applied to common ophthalmological diseases like glaucoma and neurological diseases like Parkinsonism.
Winkfein, R J; Nishikawa, S; Connor, W; Dixon, G H
1993-07-01
A synthetic oligonucleotide primer, designed from marsupial protamine protein-sequence data [Balhorn, R., Corzett, M., Matrimas, J. A., Cummins, J. & Faden, B. (1989) Analysis of protamines isolated from two marsupials, the ring-tailed wallaby and gray short-tailed opossum, J. Cell. Biol. 107] was used to amplify, via the polymerase chain reaction, protamine sequences from a North American opossum (Didelphis marsupialis) cDNA. Using the amplified sequences as probes, several protamine cDNA clones were isolated. The protein sequence, predicted from the cDNA sequences, consisted of 57 amino acids, contained a large number of arginine residues and exhibited the sequence ARYR at its amino terminus, which is conserved in avian and most eutherian mammal protamines. Like the true protamines of trout and chicken, the opossum protamine lacked cysteine residues, distinguishing it from placental mammalian protamine 1 (P1 or stable) protamines. Examination of the protamine gene, isolated by polymerase-chain-reaction amplification of genomic DNA, revealed the presence of an intron dividing the protamine-coding region, a common characteristic of all mammalian P1 genes. In addition, extensive sequence identity in the 5' and 3' flanking regions between mouse and opossum sequences classify the marsupial protamine as being closely related to placental mammal P1. Protamine transcripts, in both birds and mammals, are present in two size classes, differing by the length of their poly(A) tails (either short or long). Examination of opossum protamine transcripts by Northern hybridization revealed four distinct mRNA species in the total RNA fraction, two of which were enriched in the poly(A)-rich fraction. Northern-blot analysis, using an intron-specific probe, revealed the presence of intron sequences in two of the four protamine transcripts. If expressed, the corresponding protein from intron-containing transcripts would differ from spliced transcripts by length (49 versus 57 amino acids) and would contain a cysteine residue.
Varga, Elizabeth; Chao, Elizabeth C; Yeager, Nicholas D
2015-09-01
Next-generation sequencing (NGS) technology is increasingly utilized to identify therapeutic targets for patients with malignancy. This technology also has the capability to reveal the presence of constitutional genetic alterations, which may have significant implications for patients and their family members. Here we present the case of a 23 year old Caucasian patient with recurrent undifferentiated sarcoma who had NGS-based tumor analysis using an assay which simultaneously analyzed the entire coding sequence of 236 cancer-related genes (3769 exons) plus 47 introns from 19 genes often rearranged or altered in cancer. Pathogenic alterations were reported in tumor as the predicted protein alterations, BRCA2 "R645fs*15″ and MLH1 "E694*". Because constitutional BRCA2 and MLH1 gene mutations are associated with Hereditary Breast Ovarian Cancer Syndrome (HBOCS) and Lynch syndrome respectively, sequence analysis of DNA isolated from peripheral blood was performed. The presence of the alterations, BRCA2 c.1929delG and MLH1 c.2080G>T, corresponding to the previously reported predicted protein alterations, were confirmed by Sanger sequencing in the constitutional DNA. An additional DNA finding was reported in this analysis, MLH1 c.2081A>C at the neighboring nucleotide. Further evaluation of the family revealed that all alterations were paternally inherited and the two MLH1 substitutions were in cis, more appropriately referred to as MLH1 c.2080_2081delGAinsTC, which is classified as a variant of uncertain significance. This case illustrates important considerations related to appropriate interpretation of NGS tumor results and follow-up of patients with potentially deleterious constitutional alterations.
Foong, Choon Pin; Lau, Nyok-Sean; Deguchi, Shigeru; Toyofuku, Takashi; Taylor, Todd D; Sudesh, Kumar; Matsui, Minami
2014-12-24
Special features of the Japanese ocean include its ranges of latitude and depth. This study is the first to examine the diversity of Class I and II PHA synthases (PhaC) in DNA samples from pelagic seawater taken from the Japan Trench and Nankai Trough from a range of depths from 24 m to 5373 m. PhaC is the key enzyme in microorganisms that determines the types of monomer units that are polymerized into polyhydroxyalkanoate (PHA) and thus affects the physicochemical properties of this thermoplastic polymer. Complete putative PhaC sequences were determined via genome walking, and the activities of newly discovered PhaCs were evaluated in a heterologous host. A total of 76 putative phaC PCR fragments were amplified from the whole genome amplified seawater DNA. Of these 55 clones contained conserved PhaC domains and were classified into 20 genetic groups depending on their sequence similarity. Eleven genetic groups have undisclosed PhaC activity based on their distinct phylogenetic lineages from known PHA producers. Three complete DNA coding sequences were determined by IAN-PCR, and one PhaC was able to produce poly(3-hydroxybutyrate) in recombinant Cupriavidus necator PHB-4 (PHB-negative mutant). A new functional PhaC that has close identity to Marinobacter sp. was discovered in this study. Phylogenetic classification for all the phaC genes isolated from uncultured bacteria has revealed that seawater and other environmental resources harbor a great diversity of PhaCs with activities that have not yet been investigated. Functional evaluation of these in silico-based PhaCs via genome walking has provided new insights into the polymerizing ability of these enzymes.
Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun
2016-06-01
Microparticles carrying quick response (QR) barcodes are fabricated by J. Wang and co-workers on page 3259, using a massive coding of dissociated elements (MiCODE) technology. Each microparticle can bear a special custom-designed QR code that enables encryption or tagging with unlimited multiplexity, and the QR code can be easily read by cellphone applications. The utility of MiCODE particles in multiplexed DNA detection and microtagging for anti-counterfeiting is explored. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie
2003-04-02
Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
Balintová, Jana; Plucnara, Medard; Vidláková, Pavlína; Pohl, Radek; Havran, Luděk; Fojta, Miroslav; Hocek, Michal
2013-09-16
Benzofurazane has been attached to nucleosides and dNTPs, either directly or through an acetylene linker, as a new redox label for electrochemical analysis of nucleotide sequences. Primer extension incorporation of the benzofurazane-modified dNTPs by polymerases has been developed for the construction of labeled oligonucleotide probes. In combination with nitrophenyl and aminophenyl labels, we have successfully developed a three-potential coding of DNA bases and have explored the relevant electrochemical potentials. The combination of benzofurazane and nitrophenyl reducible labels has proved to be excellent for ratiometric analysis of nucleotide sequences and is suitable for bioanalytical applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2011 CFR
2011-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2010 CFR
2010-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2012 CFR
2012-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2014 CFR
2014-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
28 CFR Appendix A to Part 812 - Qualifying District of Columbia Code Offenses
Code of Federal Regulations, 2013 CFR
2013-07-01
... FOR THE DISTRICT OF COLUMBIA COLLECTION AND USE OF DNA INFORMATION Pt. 812, App. A Appendix A to Part... Columbia, the DNA Sample Collection Act of 2001 identifies the criminal offenses listed in Table 1 of this appendix as “qualifying District of Columbia offenses” for the purposes of the DNA Analysis Backlog...
Wildman, Derek E.; Uddin, Monica; Liu, Guozhen; Grossman, Lawrence I.; Goodman, Morris
2003-01-01
What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare ≈90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human–chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human–chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee). PMID:12766228
Wildman, Derek E; Uddin, Monica; Liu, Guozhen; Grossman, Lawrence I; Goodman, Morris
2003-06-10
What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare approximately 90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human-chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human-chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee).
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.
Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E
1995-05-01
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis
NASA Technical Reports Server (NTRS)
Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Keane, Frank; Hammond, Laura; Kelliher, Gerry; Mealy, Ken
2017-12-12
In the year to July 2017, surgical disciplines accounted for 73% of the total national inpatient and day case waiting list and, of these, day cases accounted for 72%. Their proper classification is therefore important so that patients can be managed and treated in the most suitable and efficient setting. We set out to sub-classify the different elective surgical day cases treated in Irish public hospitals in order to assess their need to be managed as day cases and the consistency of practice between hospitals. We analysed all elective day cases that came under the care of surgeons between January 2014 and December 2016 and sub-classified them into those that were (A) true day case surgical procedures; (B) minor surgery or outpatient procedures; (C) gastrointestinal endoscopies; (D) day case, non-surgical interventions and (E) unclassified or having no primary procedure identified. Of 813,236 day case surgical interventions performed over 3 years, 26% were adjudged to accord with group A, 41% with B, 23% with C, 5% with D and 5% with E. The ratio of A to B procedures did not vary significantly across the range of hospital types. However, there were some notable variations in coding and practices between hospitals. Our findings show that many day cases should have been performed as outpatient procedures and that there were variations in coding and practices between hospitals that could not be easily explained. Outpatient procedure coding and a better, more consistent, classification of day cases are both required to better manage this group of patients.
Comparisons of Upwelling and Relaxation Events in the Monterey Bay Area
2010-06-22
of this paper (has ) (has never x ) been classified. 1226 Office of Counsel.Code 1008.3 ADOR/Director NCST E. R. Franchi , 7000 4AX Public...and Code (Principal Author) Autfiorta) $f)uj/m/i Office of Counsel,Code 1008.3 ADOR/Director NCST E. R. Franchi , 7000 Public Affairs...formulation for horizontal mixing [Martin, 2000]. [12] The NCOM global model [Rhodes et al, 2002; Barron et al, 2004] has 1/8° horizontal resolution and the
Testing the utility of matK and ITS DNA regions for discrimination of Allium species
USDA-ARS?s Scientific Manuscript database
Molecular phylogenetic analysis of the genus Allium L. has been mainly based on the nucleotide sequences of ITS region. In 2009 matK and rbcL were accepted as a two-locus DNA barcode to classify plant species by the Consortium for the Barcode of Life (CBOL) Plant Working Group. MatK region has been ...
Darwin and Evolution: A Set of Activities Based on the Evolution of Mammals
ERIC Educational Resources Information Center
Haresnape, Janet M.
2010-01-01
These activities, prepared for key stage 5 students (ages 16-18) and also suitable for key stage 4 (ages 14-16), show that physical appearance is not necessarily the best way to classify mammals. DNA structure is examined to show how similarities and differences between DNA sequences of mammals can be used to establish evolutionary relationships.…
iDBPs: a web server for the identification of DNA binding proteins.
Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2010-03-01
The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies. http://idbps.tau.ac.il/
Pang, Junbiao; Qin, Lei; Zhang, Chunjie; Zhang, Weigang; Huang, Qingming; Yin, Baocai
2015-12-01
Local coordinate coding (LCC) is a framework to approximate a Lipschitz smooth function by combining linear functions into a nonlinear one. For locally linear classification, LCC requires a coding scheme that heavily determines the nonlinear approximation ability, posing two main challenges: 1) the locality making faraway anchors have smaller influences on current data and 2) the flexibility balancing well between the reconstruction of current data and the locality. In this paper, we address the problem from the theoretical analysis of the simplest local coding schemes, i.e., local Gaussian coding and local student coding, and propose local Laplacian coding (LPC) to achieve the locality and the flexibility. We apply LPC into locally linear classifiers to solve diverse classification tasks. The comparable or exceeded performances of state-of-the-art methods demonstrate the effectiveness of the proposed method.
Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA
Ma, Xiaoqi
2015-01-01
A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867
Validity of the coding for herpes simplex encephalitis in the Danish National Patient Registry.
Jørgensen, Laura Krogh; Dalgaard, Lars Skov; Østergaard, Lars Jørgen; Andersen, Nanna Skaarup; Nørgaard, Mette; Mogensen, Trine Hyrup
2016-01-01
Large health care databases are a valuable source of infectious disease epidemiology if diagnoses are valid. The aim of this study was to investigate the accuracy of the recorded diagnosis coding of herpes simplex encephalitis (HSE) in the Danish National Patient Registry (DNPR). The DNPR was used to identify all hospitalized patients, aged ≥15 years, with a first-time diagnosis of HSE according to the International Classification of Diseases, tenth revision (ICD-10), from 2004 to 2014. To validate the coding of HSE, we collected data from the Danish Microbiology Database, from departments of clinical microbiology, and from patient medical records. Cases were classified as confirmed, probable, or no evidence of HSE. We estimated the positive predictive value (PPV) of the HSE diagnosis coding stratified by diagnosis type, study period, and department type. Furthermore, we estimated the proportion of HSE cases coded with nonspecific ICD-10 codes of viral encephalitis and also the sensitivity of the HSE diagnosis coding. We were able to validate 398 (94.3%) of the 422 HSE diagnoses identified via the DNPR. Hereof, 202 (50.8%) were classified as confirmed cases and 29 (7.3%) as probable cases providing an overall PPV of 58.0% (95% confidence interval [CI]: 53.0-62.9). For "Encephalitis due to herpes simplex virus" (ICD-10 code B00.4), the PPV was 56.6% (95% CI: 51.1-62.0). Similarly, the PPV for "Meningoencephalitis due to herpes simplex virus" (ICD-10 code B00.4A) was 56.8% (95% CI: 39.5-72.9). "Herpes viral encephalitis" (ICD-10 code G05.1E) had a PPV of 75.9% (95% CI: 56.5-89.7), thereby representing the highest PPV. The estimated sensitivity was 95.5%. The PPVs of the ICD-10 diagnosis coding for adult HSE in the DNPR were relatively low. Hence, the DNPR should be used with caution when studying patients with encephalitis caused by herpes simplex virus.
Greenberg, Jay R.; Perry, Robert P.
1971-01-01
The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767
Methodology for fast detection of false sharing in threaded scientific codes
Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang
2014-11-25
A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Frigerio, Alessandra; Costantino, Elisabetta; Ceppi, Elisa; Barone, Lavinia
2013-01-01
The main aim of this study was to investigate the correlates of a Hostile-Helpless (HH) state of mind among 67 women belonging to a community sample and two different at-risk samples matched on socio-economic indicators, including 20 women from low-SES population (poverty sample) and 15 women at risk for maltreatment being monitored by the social services for the protection of juveniles (maltreatment risk sample). The Adult Attachment Interview (AAI) protocols were reliably coded blind to the samples' group status. The rates of HH classification increased in relation to the risk status of the three samples, ranging from 9% for the low-risk sample to 60% for the maltreatment risk sample to 75% for mothers in the maltreatment risk sample who actually maltreated their infants. In terms of the traditional AAI classification system, 88% of the interviews from the maltreating mothers were classified Unresolved/Cannot Classify (38%) or Preoccupied (50%). Partial overlapping between the 2 AAI coding systems was found, and discussion concerns the relevant contributions of each AAI coding system to understanding of the intergenerational transmission of maltreatment.
Nicolas, Laura; Cols, Montserrat; Choi, Jee Eun; Chaudhuri, Jayanta; Vuong, Bao
2018-01-01
Adaptive immune responses require the generation of a diverse repertoire of immunoglobulins (Igs) that can recognize and neutralize a seemingly infinite number of antigens. V(D)J recombination creates the primary Ig repertoire, which subsequently is modified by somatic hypermutation (SHM) and class switch recombination (CSR). SHM promotes Ig affinity maturation whereas CSR alters the effector function of the Ig. Both SHM and CSR require activation-induced cytidine deaminase (AID) to produce dU:dG mismatches in the Ig locus that are transformed into untemplated mutations in variable coding segments during SHM or DNA double-strand breaks (DSBs) in switch regions during CSR. Within the Ig locus, DNA repair pathways are diverted from their canonical role in maintaining genomic integrity to permit AID-directed mutation and deletion of gene coding segments. Recently identified proteins, genes, and regulatory networks have provided new insights into the temporally and spatially coordinated molecular interactions that control the formation and repair of DSBs within the Ig locus. Unravelling the genetic program that allows B cells to selectively alter the Ig coding regions while protecting non-Ig genes from DNA damage advances our understanding of the molecular processes that maintain genomic integrity as well as humoral immunity. PMID:29744038
Xiao, P; Niu, L L; Zhao, Q J; Chen, X Y; Wang, L J; Li, L; Zhang, H P; Guo, J Z; Xu, H Y; Zhong, T
2017-11-16
The origins and phylogeny of different sheep breeds has been widely studied using polymorphisms within the mitochondrial hypervariable region. However, little is known about the mitochondrial DNA (mtDNA) content and phylogeny based on mtDNA protein-coding genes. In this study, we assessed the phylogeny and copy number of the mtDNA in eight indigenous (population size, n=184) and three introduced (n=66) sheep breeds in China based on five mitochondrial coding genes (COX1, COX2, ATP8, ATP6 and COX3). The mean haplotype and nucleotide diversities were 0.944 and 0.00322, respectively. We identified a correlation between the lineages distribution and the genetic distance, whereby Valley-type Tibetan sheep had a closer genetic relationship with introduced breeds (Dorper, Poll Dorset and Suffolk) than with other indigenous breeds. Similarly, the Median-joining profile of haplotypes revealed the distribution of clusters according to genetic differences. Moreover, copy number analysis based on the five mitochondrial coding genes was affected by the genetic distance combining with genetic phylogeny; we also identified obvious non-synonymous mutations in ATP6 between the different levels of copy number expressions. These results imply that differences in mitogenomic compositions resulting from geographical separation lead to differences in mitochondrial function.
Harnessing epigenome modifications for better crops
USDA-ARS?s Scientific Manuscript database
Chemical DNA modifications such as methylation influence translation of the DNA code to specific genetic outcomes. While such modifications can be heritable, others are transient, and their overall contribution to plant genetic diversity remains intriguing but uncertain. The focus of this article is...
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
Morard, Raphaël; Escarguel, Gilles; Weiner, Agnes K M; André, Aurore; Douady, Christophe J; Wade, Christopher M; Darling, Kate F; Ujiié, Yurika; Seears, Heidi A; Quillévéré, Frédéric; de Garidel-Thoron, Thibault; de Vargas, Colomban; Kucera, Michal
2016-09-01
Investigations of biodiversity, biogeography, and ecological processes rely on the identification of "species" as biologically significant, natural units of evolution. In this context, morphotaxonomy only provides an adequate level of resolution if reproductive isolation matches morphological divergence. In many groups of organisms, morphologically defined species often disguise considerable genetic diversity, which may be indicative of the existence of cryptic species. The diversity hidden by morphological species can be disentangled through genetic surveys, which also provide access to data on the ecological distribution of genetically circumscribed units. These units can be identified by unique DNA sequence motifs and allow studies of evolutionary and ecological processes at different levels of divergence. However, the nomenclature of genetically circumscribed units within morphological species is not regulated and lacks stability. This represents a major obstacle to efforts to synthesize and communicate data on genetic diversity for multiple stakeholders. We have been confronted with such an obstacle in our work on planktonic foraminifera, where the stakeholder community is particularly diverse, involving geochemists, paleoceanographers, paleontologists, and biologists, and the lack of stable nomenclature beyond the level of formal morphospecies prevents effective transfer of knowledge. To circumvent this problem, we have designed a stable, reproducible, and flexible nomenclature system for genetically circumscribed units, analogous to the principles of a formal nomenclature system. Our system is based on the definition of unique DNA sequence motifs collocated within an individual, their typification (in analogy with holotypes), utilization of their hierarchical phylogenetic structure to define levels of divergence below that of the morphospecies, and a set of nomenclature rules assuring stability. The resulting molecular operational taxonomic units remain outside the domain of current nomenclature codes, but are linked to formal morphospecies as regulated by the codes. Subsequently, we show how this system can be applied to classify genetically defined units using the SSU rDNA marker in planktonic foraminifera and we highlight its potential use for other groups of organisms where similarly high levels of connectivity between molecular and formal taxonomies can be achieved. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
DNA Mapping Made Simple: An Intellectual Activity about the Genetic Modification of Organisms
ERIC Educational Resources Information Center
Marques, Miguel; Arrabaca, Joao; Chagas, Isabel
2004-01-01
Since the discovery of the DNA double helix (in 1953 by Watson and Crick), technologies have been developed that allow scientists to manipulate the genome of bacteria to produce human hormones, as well as the genome of crop plants to achieve high yield and enhanced flavor. The universality of the genetic code has allowed DNA isolated from a…
Multiple tag labeling method for DNA sequencing
Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.
1995-01-01
A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.
Vlahovicek, K; Munteanu, M G; Pongor, S
1999-01-01
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P
2017-03-01
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
NASA Technical Reports Server (NTRS)
Plante, Ianik; Ponomarev, Artem L.; Wu, Honglu; Blattnig, Steve; George, Kerry
2014-01-01
The formation of DNA double-strand breaks (DSBs) and chromosome aberrations is an important consequence of ionizing radiation. To simulate DNA double-strand breaks and the formation of chromosome aberrations, we have recently merged the codes RITRACKS (Relativistic Ion Tracks) and NASARTI (NASA Radiation Track Image). The program RITRACKS is a stochastic code developed to simulate detailed event-by-event radiation track structure: [1] This code is used to calculate the dose in voxels of 20 nm, in a volume containing simulated chromosomes, [2] The number of tracks in the volume is calculated for each simulation by sampling a Poisson distribution, with the distribution parameter obtained from the irradiation dose, ion type and energy. The program NASARTI generates the chromosomes present in a cell nucleus by random walks of 20 nm, corresponding to the size of the dose voxels, [3] The generated chromosomes are located within domains which may intertwine, and [4] Each segment of the random walks corresponds to approx. 2,000 DNA base pairs. NASARTI uses pre-calculated dose at each voxel to calculate the probability of DNA damage at each random walk segment. Using the location of double-strand breaks, possible rejoining between damaged segments is evaluated. This yields various types of chromosomes aberrations, including deletions, inversions, exchanges, etc. By performing the calculations using various types of radiations, it will be possible to obtain relative biological effectiveness (RBE) values for several types of chromosome aberrations.
A clinical measure of DNA methylation predicts outcome in de novo acute myeloid leukemia
Luskin, Marlise R.; Gimotty, Phyllis A.; Smith, Catherine; Loren, Alison W.; Figueroa, Maria E.; Harrison, Jenna; Sun, Zhuoxin; Tallman, Martin S.; Paietta, Elisabeth M.; Litzow, Mark R.; Melnick, Ari M.; Levine, Ross L.; Fernandez, Hugo F.; Luger, Selina M.; Master, Stephen R.; Wertheim, Gerald B.W.
2016-01-01
BACKGROUND. Variable response to chemotherapy in acute myeloid leukemia (AML) represents a major treatment challenge. Clinical and genetic features incompletely predict outcome. The value of clinical epigenetic assays for risk classification has not been extensively explored. We assess the prognostic implications of a clinical assay for multilocus DNA methylation on adult patients with de novo AML. METHODS. We performed multilocus DNA methylation assessment using xMELP on samples and calculated a methylation statistic (M-score) for 166 patients from UPENN with de novo AML who received induction chemotherapy. The association of M-score with complete remission (CR) and overall survival (OS) was evaluated. The optimal M-score cut-point for identifying groups with differing survival was used to define a binary M-score classifier. This classifier was validated in an independent cohort of 383 patients from the Eastern Cooperative Oncology Group Trial 1900 (E1900; NCT00049517). RESULTS. A higher mean M-score was associated with death and failure to achieve CR. Multivariable analysis confirmed that a higher M-score was associated with death (P = 0.011) and failure to achieve CR (P = 0.034). Median survival was 26.6 months versus 10.6 months for low and high M-score groups. The ability of the M-score to perform as a classifier was confirmed in patients ≤ 60 years with intermediate cytogenetics and patients who achieved CR, as well as in the E1900 validation cohort. CONCLUSION. The M-score represents a valid binary prognostic classifier for patients with de novo AML. The xMELP assay and associated M-score can be used for prognosis and should be further investigated for clinical decision making in AML patients. PMID:27446991
Late night activity regarding stroke codes: LuNAR strokes.
Tafreshi, Gilda; Raman, Rema; Ernstrom, Karin; Rapp, Karen; Meyer, Brett C
2012-08-01
There is diurnal variation for cardiac arrest and sudden cardiac death. Stroke may show a similar pattern. We assessed whether strokes presenting during a particular time of day or night are more likely of vascular etiology. To compare emergency department stroke codes arriving between 22:00 and 8:00 hours (LuNAR strokes) vs. others (n-LuNAR strokes). The purpose was to determine if late night strokes are more likely to be true strokes or warrant acute tissue plasminogen activator evaluations. We reviewed prospectively collected cases in the University of California, San Diego Stroke Team database gathered over a four-year period. Stroke codes at six emergency departments were classified based on arrival time. Those arriving between 22:00 and 8:00 hours were classified as LuNAR stroke codes, the remainder were classified as 'n-LuNAR'. Patients were further classified as intracerebral hemorrhage, acute ischemic stroke not receiving tissue plasminogen activator, acute ischemic stroke receiving tissue plasminogen activator, transient ischemic attack, and nonstroke. Categorical outcomes were compared using Fisher's Exact test. Continuous outcomes were compared using Wilcoxon's Rank-sum test. A total of 1607 patients were included in our study, of which, 299 (19%) were LuNAR code strokes. The overall median NIHSS was five, higher in the LuNAR group (n-LuNAR 5, LuNAR 7; P=0·022). There was no overall differences in patient diagnoses between LuNAR and n-LuNAR strokes (P=0·169) or diagnosis of acute ischemic stroke receiving tissue plasminogen activator (n-LuNAR 191 (14·6%), LuNAR 42 (14·0%); P=0·86). Mean arrival to computed tomography scan time was longer during LuNAR hours (n-LuNAR 54·9±76·3 min, LuNAR 62·5±87·7 min; P=0·027). There was no significant difference in 90-day mortality (n-LuNAR 15·0%, LuNAR 13·2%; P=0·45). Our stroke center experience showed no difference in diagnosis of acute ischemic stroke between day and night stroke codes. This similarity was further supported in similar rates of tissue plasminogen activator administration. Late night strokes may warrant a more rapid stroke specialist evaluation due to the longer time elapsed from symptom onset and the longer time to computed tomography scan. © 2011 The Authors. International Journal of Stroke © 2011 World Stroke Organization.
Haben repetitive DNA-Sequenzen biologische Funktionen?
NASA Astrophysics Data System (ADS)
John, Maliyakal E.; Knöchel, Walter
1983-05-01
By DNA reassociation kinetics it is known that the eucaryotic genome consists of non-repetitive DNA, middle-repetitive DNA and highly repetitive DNA. Whereas the majority of protein-coding genes is located on non-repetitive DNA, repetitive DNA forms a constitutive part of eucaryotic DNA and its amount in most cases equals or even substantially exceeds that of non-repetitive DNA. During the past years a large body of data on repetitive DNA has accumulated and these have prompted speculations ranging from specific roles in the regulation of gene expression to that of a selfish entity with inconsequential functions. The following article summarizes recent findings on structural, transcriptional and evolutionary aspects and, although by no means being proven, some possible biological functions are discussed.
Vladimirov, N V; Likhoshvaĭ, V A; Matushkin, Iu G
2007-01-01
Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Xie, Qing; Shen, Kang-Ning; Hao, Xiuying; Nam, Phan Nhut; Ngoc Hieu, Bui Thi; Chen, Ching-Hung; Zhu, Changqing; Lin, Yen-Chang; Hsiao, Chung-Der
2017-03-01
abtract We decoded the complete chloroplast DNA (cpDNA) sequence of the Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae, by using next-generation sequencing technology. The genome consists of 152 490 bp containing a pair of inverted repeats (IRs) of 25 202 bp, which was separated by a large single-copy region and a small single-copy region of 83 446 bp and 18 639 bp, respectively. The genic regions account for 57.7% of whole cpDNA, and the GC content of the cpDNA was 37.7%. The S. involucrata cpDNA encodes 114 unigenes (82 protein-coding genes, 4 rRNA genes, and 28 tRNA genes). There are eight protein-coding genes (atpF, ndhA, ndhB, rpl2, rpoC1, rps16, clpP, and ycf3) and five tRNA genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) containing introns. A phylogenetic analysis of the 11 complete cpDNA from Asteracease showed that S. involucrata is closely related to Centaurea diffusa (Diffuse Knapweed). The complete cpDNA of S. involucrata provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Asteraceae.
Population-specific variation in haplotype composition and heterozygosity at the POLB locus.
Yamtich, Jennifer; Speed, William C; Straka, Eva; Kidd, Judith R; Sweasy, Joann B; Kidd, Kenneth K
2009-05-01
DNA polymerase beta plays a central role in base excision repair (BER), which removes large numbers of endogenous DNA lesions from each cell on a daily basis. Little is currently known about germline polymorphisms within the POLB locus, making it difficult to study the association of variants at this locus with human diseases such as cancer. Yet, approximately thirty percent of human tumor types show variants of DNA polymerase beta. We have assessed the global frequency distributions of coding and common non-coding SNPs in and flanking the POLB gene for a total of 14 sites typed in approximately 2400 individuals from anthropologically defined human populations worldwide. We have found a marked difference between haplotype frequencies in African populations and in non-African populations.
Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet.
Guo, Haihong; Na, Xu; Hou, Li; Li, Jiao
2017-06-20
In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging. This study aimed to classify health care-related questions posted by the general public (Chinese speakers) on the Internet. A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology. The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F 1 score of 90.24%. In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users' information needs on health care. ©Haihong Guo, Xu Na, Li Hou, Jiao Li. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.06.2017.
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition
NASA Astrophysics Data System (ADS)
Štambuk, Nikola
The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
How-To-Do-It: Recombinant DNA Made Easy: I. "Jumping Genes."
ERIC Educational Resources Information Center
Thomson, Robert G.
1988-01-01
Presents as part I of a two-part series a study involving the intercellular transfer of bacterial DNA that codes for the resistance to antibiotics. Demonstrates to students that such transfers occur. Discusses laboratory procedures, materials and results. (CW)
Kowalski, Madzia P.; Baylis, Howard A.; Krude, Torsten
2015-01-01
ABSTRACT Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication proteins in vitro, and that conserved nucleotide sequence motifs are essential for this function. By functionally inactivating sbRNAs with antisense morpholino oligonucleotides, we show that sbRNAs are required for S phase progression, early embryonic development and the viability of C. elegans in vivo. Thus, we demonstrate a new and essential role for sbRNAs during the early development of C. elegans. sbRNAs show limited nucleotide sequence similarity to vertebrate Y RNAs, which are also essential for the initiation of DNA replication. Our results therefore establish that the essential function of small non-coding stem-loop RNAs during DNA replication extends beyond vertebrates. PMID:25908866
Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew
2018-05-17
Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer.
Goremykin, Vadim V; Salamini, Francesco; Velasco, Riccardo; Viola, Roberto
2009-01-01
The mitochondrial genome of grape (Vitis vinifera), the largest organelle genome sequenced so far, is presented. The genome is 773,279 nt long and has the highest coding capacity among known angiosperm mitochondrial DNAs (mtDNAs). The proportion of promiscuous DNA of plastid origin in the genome is also the largest ever reported for an angiosperm mtDNA, both in absolute and relative terms. In all, 42.4% of chloroplast genome of Vitis has been incorporated into its mitochondrial genome. In order to test if horizontal gene transfer (HGT) has also contributed to the gene content of the grape mtDNA, we built phylogenetic trees with the coding sequences of mitochondrial genes of grape and their homologs from plant mitochondrial genomes. Many incongruent gene tree topologies were obtained. However, the extent of incongruence between these gene trees is not significantly greater than that observed among optimal trees for chloroplast genes, the common ancestry of which has never been in doubt. In both cases, we attribute this incongruence to artifacts of tree reconstruction, insufficient numbers of characters, and gene paralogy. This finding leads us to question the recent phylogenetic interpretation of Bergthorsson et al. (2003, 2004) and Richardson and Palmer (2007) that rampant HGT into the mtDNA of Amborella best explains phylogenetic incongruence between mitochondrial gene trees for angiosperms. The only evidence for HGT into the Vitis mtDNA found involves fragments of two coding sequences stemming from two closteroviruses that cause the leaf roll disease of this plant. We also report that analysis of sequences shared by both chloroplast and mitochondrial genomes provides evidence for a previously unknown gene transfer route from the mitochondrion to the chloroplast.
Cheng, Rubin; Zheng, Xiaodong; Ma, Yuanyuan; Li, Qi
2013-01-01
In the present study, we determined the complete mitochondrial DNA (mtDNA) sequences of two species of Cistopus, namely C. chinensis and C. taiwanicus, and conducted a comparative mt genome analysis across the class Cephalopoda. The mtDNA length of C. chinensis and C. taiwanicus are 15706 and 15793 nucleotides with an AT content of 76.21% and 76.5%, respectively. The sequence identity of mtDNA between C. chinensis and C. taiwanicus was 88%, suggesting a close relationship. Compared with C. taiwanicus and other octopods, C. chinensis encoded two additional tRNA genes, showing a novel gene arrangement. In addition, an unusual 23 poly (A) signal structure is found in the ATP8 coding region of C. chinensis. The entire genome and each protein coding gene of the two Cistopus species displayed notable levels of AT and GC skews. Based on sliding window analysis among Octopodiformes, ND1 and DN5 were considered to be more reliable molecular beacons. Phylogenetic analyses based on the 13 protein-coding genes revealed that C. chinensis and C. taiwanicus form a monophyletic group with high statistical support, consistent with previous studies based on morphological characteristics. Our results also indicated that the phylogenetic position of the genus Cistopus is closer to Octopus than to Amphioctopus and Callistoctopus. The complete mtDNA sequence of C. chinensis and C. taiwanicus represent the first whole mt genomes in the genus Cistopus. These novel mtDNA data will be important in refining the phylogenetic relationships within Octopodiformes and enriching the resource of markers for systematic, population genetic and evolutionary biological studies of Cephalopoda. PMID:24358345
Royo-Bordonada, M Á; León-Flández, K; Damián, J; Bosqued-Estefanía, M J; Moya-Geromini, M Á; López-Jurado, L
2016-08-01
To examine the extent and nature of food television advertising directed at children in Spain using an international food-based system and the United Kingdom nutrient profile model (UKNPM). Cross-sectional study of advertisements of food and drinks shown on five television channels over 7 days in 2012 (8am-midnight). Showing time and duration of each advertisement was recorded. Advertisements were classified as core (nutrient-rich/calorie-low products), non-core, or miscellaneous based on the international system, and either healthy/less healthy, i.e., high in saturated fats, trans-fatty acids, salt, or free sugars (HFSS), according to UKNPM. The food industry accounted for 23.7% of the advertisements (4212 out of 17,722) with 7.5 advertisements per hour of broadcasting. The international food-based coding system classified 60.2% of adverts as non-core, and UKNPM classified 64.0% as HFSS. Up to 31.5% of core, 86.8% of non-core, and 8.3% of miscellaneous advertisements were for HFSS products. The percentage of advertisements for HFSS products was higher during reinforced protected viewing times (69.0%), on weekends (71.1%), on channels of particular appeal to children and teenagers (67.8%), and on broadcasts regulated by the Spanish Code of self-regulation of the advertising of food products directed at children (70.7%). Both schemes identified that a majority of foods advertised were unhealthy, although some classification differences between the two systems are important to consider. The food advertising Code is not limiting Spanish children's exposure to advertisements for HFSS products, which were more frequent on Code-regulated broadcasts and during reinforced protected viewing time. Copyright © 2016 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Wickramaarachchi, W A R T; Shankarappa, K S; Rangaswamy, K T; Maruthi, M N; Rajapakse, R G A S; Ghosh, Saptarshi
2016-06-01
Bunchy top disease of banana caused by Banana bunchy top virus (BBTV, genus Babuvirus family Nanoviridae) is one of the most important constraints in production of banana in the different parts of the world. Six genomic DNA components of BBTV isolate from Kandy, Sri Lanka (BBTV-K) were amplified by polymerase chain reaction (PCR) with specific primers using total DNA extracted from banana tissues showing typical symptoms of bunchy top disease. The amplicons were of expected size of 1.0-1.1 kb, which were cloned and sequenced. Analysis of sequence data revealed the presence of six DNA components; DNA-R, DNA-U3, DNA-S, DNA-N, DNA-M and DNA-C for Sri Lanka isolate. Comparisons of sequence data of DNA components followed by the phylogenetic analysis, grouped Sri Lanka-(Kandy) isolate in the Pacific Indian Oceans (PIO) group. Sri Lanka-(Kandy) isolate of BBTV is classified a new member of PIO group based on analysis of six components of the virus.
Pacific Northwest (PNW) Hydrologic Landscape (HL) polygons and HL code
A five-letter hydrologic landscape code representing five indices of hydrologic form that are related to hydrologic function: climate, seasonality, aquifer permeability, terrain, and soil permeability. Each hydrologic assessment unit is classified by one of the 81 different five-letter codes representing these indices. Polygon features in this dataset were created by aggregating (dissolving boundaries between) adjacent, similarly-coded hydrologic assessment units. Climate Classes: V-Very wet, W-Wet, M-Moist, D-Dry, S-Semiarid, A-Arid. Seasonality Sub-Classes: w-Fall or winter, s-Spring. Aquifer Permeability Classes: H-High, L-Low. Terrain Classes: M-Mountain, T-Transitional, F-Flat. Soil Permeability Classes: H-High, L-Low.
The Winding Road to Discovering the Link between Genetic Material and DNA
ERIC Educational Resources Information Center
Cherif, Abour H.; Roze, Maris; Movahedzadeh, Farahnaz
2015-01-01
This is an account of the three-centuries long journey to the discovery of the link between DNA and the transformation principle of heredity beginning with the discovery of the cell in 1665 and leading up to the 1953 discovery of the genetic code and the structure of DNA. This account also illustrates the way science works and how scientists do…
Multiple tag labeling method for DNA sequencing
Mathies, R.A.; Huang, X.C.; Quesada, M.A.
1995-07-25
A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.
Database of amino acid-nucleotide contacts in contacts in DNA-homeodomain protein
NASA Astrophysics Data System (ADS)
Grokhlina, T. I.; Zrelov, P. V.; Ivanov, V. V.; Polozov, R. V.; Chirgadze, Yu. N.; Sivozhelezov, V. S.
2013-09-01
The analysis of amino acid-nucleotide contacts in interfaces of the protein-DNA complexes, intended to find consistencies in the protein-DNA recognition, is a complex problem that requires an analysis of the physicochemical characteristics of these contacts and the positions of the participating amino acids and nucleotides in the chains of the protein and the DNA, respectively, as well as conservatism of these contacts. Thus, those heterogeneous data should be systematized. For this purpose we have developed a database of amino acid-nucleotide contacts ANTPC (Amino acid Nucleotide Type Position Conservation) following the archetypal example of the proteins in the homeodomain family. We show that it can be used to compare and classify the interfaces of the protein-DNA complexes.
Esquivel-Elizondo, Sofia; Maldonado, Juan; Krajmalnik-Brown, Rosa
2018-06-01
Carbon monoxide (CO)-metabolism and phenotypic and phylogenetic characterization of a novel anaerobic, mesophilic and hydrogenogenic carboxydotroph are reported. Strain SVCO-16 was isolated from anaerobic sludge and grows autotrophically and mixotrophically with CO. The genes cooS and cooF, coding for a CO dehydrogenase complex, and genes similar to hycE2, encoding a CO-induced hydrogenase, were present in its genome. The isolate produces H2 and CO2 from CO, and acetate and formate from organic substrates. Based on the 16S rRNA sequence, it is an Alphaproteobacterium most closely related to the genus Pleomorphomonas (98.9%-99.2% sequence identity). Comparison with other previously characterized Pleomorphomonas showed that P. diazotrophica and P. oryzae do not metabolize CO, and P. diazotrophica does not grow anaerobically with organic substrates. Average nucleotide identity values between strain SVCO-16 and P. diazotrophica, P. oryzae or P. koreensis were 86.66 ± 0.21%. These values are below the boundary to define species (95%-96%). Digital DNA-DNA hybridization estimates between strain SVCO-16 and reference strains were also below the 70% threshold for species delineation: 29.1%-34.5%. Based on the differences in CO metabolism, genome analyses and cellular fatty acid composition, the isolate should be classified into the genus Pleomorphomonas as a representative of a novel species, Pleomorphomonas carboxyditropha. The type strain of Pleomorphomonas carboxyditropha is SVCO-16T (strain deposit numbers, DSM 106132T and TSD-119T).
Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2012-05-10
RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
DNA copy number changes define spatial patterns of heterogeneity in colorectal cancer
Mamlouk, Soulafa; Childs, Liam Harold; Aust, Daniela; Heim, Daniel; Melching, Friederike; Oliveira, Cristiano; Wolf, Thomas; Durek, Pawel; Schumacher, Dirk; Bläker, Hendrik; von Winterfeld, Moritz; Gastl, Bastian; Möhr, Kerstin; Menne, Andrea; Zeugner, Silke; Redmer, Torben; Lenze, Dido; Tierling, Sascha; Möbs, Markus; Weichert, Wilko; Folprecht, Gunnar; Blanc, Eric; Beule, Dieter; Schäfer, Reinhold; Morkel, Markus; Klauschen, Frederick; Leser, Ulf; Sers, Christine
2017-01-01
Genetic heterogeneity between and within tumours is a major factor determining cancer progression and therapy response. Here we examined DNA sequence and DNA copy-number heterogeneity in colorectal cancer (CRC) by targeted high-depth sequencing of 100 most frequently altered genes. In 97 samples, with primary tumours and matched metastases from 27 patients, we observe inter-tumour concordance for coding mutations; in contrast, gene copy numbers are highly discordant between primary tumours and metastases as validated by fluorescent in situ hybridization. To further investigate intra-tumour heterogeneity, we dissected a single tumour into 68 spatially defined samples and sequenced them separately. We identify evenly distributed coding mutations in APC and TP53 in all tumour areas, yet highly variable gene copy numbers in numerous genes. 3D morpho-molecular reconstruction reveals two clusters with divergent copy number aberrations along the proximal–distal axis indicating that DNA copy number variations are a major source of tumour heterogeneity in CRC. PMID:28120820
Re-visiting protein-centric two-tier classification of existing DNA-protein complexes
2012-01-01
Background Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. Results On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Conclusions Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information. PMID:22800292
Re-visiting protein-centric two-tier classification of existing DNA-protein complexes.
Malhotra, Sony; Sowdhamini, Ramanathan
2012-07-16
Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information.
Replication and Transcription of Eukaryotic DNA in Esherichia coli
Morrow, John F.; Cohen, Stanley N.; Chang, Annie C. Y.; Boyer, Herbert W.; Goodman, Howard M.; Helling, Robert B.
1974-01-01
Fragments of amplified Xenopus laevis DNA, coding for 18S and 28S ribosomal RNA and generated by EcoRI restriction endonuclease, have been linked in vitro to the bacterial plasmid pSC101; and the recombinant molecular species have been introduced into E. coli by transformation. These recombinant plasmids, containing both eukaryotic and prokaryotic DNA, replicate stably in E. coli. RNA isolated from E. coli minicells harboring the plasmids hybridizes to amplified X. laevis rDNA. Images PMID:4600264
Nadimi, Maryam; Daubois, Laurence; Hijri, Mohamed
2016-05-01
Mitochondrial (mt) genes, such as cytochrome C oxidase genes (cox), have been widely used for barcoding in many groups of organisms, although this approach has been less powerful in the fungal kingdom due to the rapid evolution of their mt genomes. The use of mt genes in phylogenetic studies of Dikarya has been met with success, while early diverging fungal lineages remain less studied, particularly the arbuscular mycorrhizal fungi (AMF). Advances in next-generation sequencing have substantially increased the number of publically available mtDNA sequences for the Glomeromycota. As a result, comparison of mtDNA across key AMF taxa can now be applied to assess the phylogenetic signal of individual mt coding genes, as well as concatenated subsets of coding genes. Here we show comparative analyses of publically available mt genomes of Glomeromycota, augmented with two mtDNA genomes that were newly sequenced for this study (Rhizophagus irregularis DAOM240159 and Glomus aggregatum DAOM240163), resulting in 16 complete mtDNA datasets. R. irregularis isolate DAOM240159 and G. aggregatum isolate DAOM240163 showed mt genomes measuring 72,293bp and 69,505bp with G+C contents of 37.1% and 37.3%, respectively. We assessed the phylogenies inferred from single mt genes and complete sets of coding genes, which are referred to as "supergenes" (16 concatenated coding genes), using Shimodaira-Hasegawa tests, in order to identify genes that best described AMF phylogeny. We found that rnl, nad5, cox1, and nad2 genes, as well as concatenated subset of these genes, provided phylogenies that were similar to the supergene set. This mitochondrial genomic analysis was also combined with principal coordinate and partitioning analyses, which helped to unravel certain evolutionary relationships in the Rhizophagus genus and for G. aggregatum within the Glomeromycota. We showed evidence to support the position of G. aggregatum within the R. irregularis 'species complex'. Copyright © 2016 Elsevier Inc. All rights reserved.
Fister, Karin; Fister, Iztok; Murovec, Jana; Bohanec, Borut
2017-02-01
Plant breeders' rights are undergoing dramatic changes due to changes in patent rights in terms of plant variety rights protection. Although differences in the interpretation of »breeder's exemption«, termed research exemption in the 1991 UPOV, did exist in the past in some countries, allowing breeders to use protected varieties as parents in the creation of new varieties of plants, current developments brought about by patenting conventionally bred varieties with the European Patent Office (such as EP2140023B1) have opened new challenges. Legal restrictions on germplasm availability are therefore imposed on breeders while, at the same time, no practical information on how to distinguish protected from non-protected varieties is given. We propose here a novel approach that would solve this problem by the insertion of short DNA stretches (labels) into protected plant varieties by genetic transformation. This information will then be available to breeders by a simple and standardized procedure. We propose that such a procedure should consist of using a pair of universal primers that will generate a sequence in a PCR reaction, which can be read and translated into ordinary text by a computer application. To demonstrate the feasibility of such approach, we conducted a case study. Using the Agrobacterium tumefaciens transformation protocol, we inserted a stretch of DNA code into Nicotiana benthamiana. We also developed an on-line application that enables coding of any text message into DNA nucleotide code and, on sequencing, decoding it back into text. In the presented case study, a short command line coding the phrase »Hello world« was transformed into a DNA sequence that was inserted in the plant genome. The encoded message was reconstructed from the resulting T1 seedlings with 100 % accuracy. The feasibility and possible other applications of this approach are discussed.
Epigenetics of Peripheral B-Cell Differentiation and the Antibody Response
Zan, Hong; Casali, Paolo
2015-01-01
Epigenetic modifications, such as histone post-translational modifications, DNA methylation, and alteration of gene expression by non-coding RNAs, including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), are heritable changes that are independent from the genomic DNA sequence. These regulate gene activities and, therefore, cellular functions. Epigenetic modifications act in concert with transcription factors and play critical roles in B cell development and differentiation, thereby modulating antibody responses to foreign- and self-antigens. Upon antigen encounter by mature B cells in the periphery, alterations of these lymphocytes epigenetic landscape are induced by the same stimuli that drive the antibody response. Such alterations instruct B cells to undergo immunoglobulin (Ig) class switch DNA recombination (CSR) and somatic hypermutation (SHM), as well as differentiation to memory B cells or long-lived plasma cells for the immune memory. Inducible histone modifications, together with DNA methylation and miRNAs modulate the transcriptome, particularly the expression of activation-induced cytidine deaminase, which is essential for CSR and SHM, and factors central to plasma cell differentiation, such as B lymphocyte-induced maturation protein-1. These inducible B cell-intrinsic epigenetic marks guide the maturation of antibody responses. Combinatorial histone modifications also function as histone codes to target CSR and, possibly, SHM machinery to the Ig loci by recruiting specific adaptors that can stabilize CSR/SHM factors. In addition, lncRNAs, such as recently reported lncRNA-CSR and an lncRNA generated through transcription of the S region that form G-quadruplex structures, are also important for CSR targeting. Epigenetic dysregulation in B cells, including the aberrant expression of non-coding RNAs and alterations of histone modifications and DNA methylation, can result in aberrant antibody responses to foreign antigens, such as those on microbial pathogens, and generation of pathogenic autoantibodies, IgE in allergic reactions, as well as B cell neoplasia. Epigenetic marks would be attractive targets for new therapeutics for autoimmune and allergic diseases, and B cell malignancies. PMID:26697022
Tian, Ji-Yuan; Sun, Xiu-Qin; Chen, Xi-Guang
2008-05-01
Oral delivery of plasmid DNA (pDNA) is a desirable approach for fish immunization in intensive culture. However, its effectiveness is limited because of possible degradation of pDNA in the fish's digestive system. In this report, alginate microspheres loaded with pDNA coding for fish lymphocystis disease virus (LCDV) and green fluorescent protein were prepared with a modified oil containing water (W/O) emulsification method. Yield, loading percent and encapsulation efficiency of alginate microspheres were 90.5%, 1.8% and 92.7%, respectively. The alginate microspheres had diameters of less than 10 microm, and their shape was spherical. As compared to sodium alginate, a remarkable increase of DNA-phosphodiester and DNA-phosphomonoester bonds was observed for alginate microspheres loaded with pDNA by Fourier transform infrared (FTIR) spectroscopic analysis. Agarose gel electrophoresis showed a little supercoiled pDNA was transformed to open circular and linear pDNA during encapsulation. The cumulative release of pDNA in alginate microspheres was
Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.
2015-01-01
Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID:25552301
Development of code evaluation criteria for assessing predictive capability and performance
NASA Technical Reports Server (NTRS)
Lin, Shyi-Jang; Barson, S. L.; Sindir, M. M.; Prueger, G. H.
1993-01-01
Computational Fluid Dynamics (CFD), because of its unique ability to predict complex three-dimensional flows, is being applied with increasing frequency in the aerospace industry. Currently, no consistent code validation procedure is applied within the industry. Such a procedure is needed to increase confidence in CFD and reduce risk in the use of these codes as a design and analysis tool. This final contract report defines classifications for three levels of code validation, directly relating the use of CFD codes to the engineering design cycle. Evaluation criteria by which codes are measured and classified are recommended and discussed. Criteria for selecting experimental data against which CFD results can be compared are outlined. A four phase CFD code validation procedure is described in detail. Finally, the code validation procedure is demonstrated through application of the REACT CFD code to a series of cases culminating in a code to data comparison on the Space Shuttle Main Engine High Pressure Fuel Turbopump Impeller.
Abdulhameed, Hunida E; Hammami, Muhammad M; Mohamed, Elbushra A Hameed
2011-08-01
The consistency of codes governing disclosure of terminal illness to patients and families in Islamic countries has not been studied until now. To review available codes on disclosure of terminal illness in Islamic countries. DATA SOURCE AND EXTRACTION: Data were extracted through searches on Google and PubMed. Codes related to disclosure of terminal illness to patients or families were abstracted, and then classified independently by the three authors. Codes for 14 Islamic countries were located. Five codes were silent regarding informing the patient, seven allowed concealment, one mandated disclosure and one prohibited disclosure. Five codes were silent regarding informing the family, four allowed disclosure and five mandated/recommended disclosure. The Islamic Organization for Medical Sciences code was silent on both issues. Codes regarding disclosure of terminal illness to patients and families differed markedly among Islamic countries. They were silent in one-third of the codes, and tended to favour a paternalistic/utilitarian, family-centred approach over an autonomous, patient-centred approach.
A Link between ORC-Origin Binding Mechanisms and Origin Activation Time Revealed in Budding Yeast
Hoggard, Timothy; Shor, Erika; Müller, Carolin A.; Nieduszynski, Conrad A.; Fox, Catherine A.
2013-01-01
Eukaryotic DNA replication origins are selected in G1-phase when the origin recognition complex (ORC) binds chromosomal positions and triggers molecular events culminating in the initiation of DNA replication (a.k.a. origin firing) during S-phase. Each chromosome uses multiple origins for its duplication, and each origin fires at a characteristic time during S-phase, creating a cell-type specific genome replication pattern relevant to differentiation and genome stability. It is unclear whether ORC-origin interactions are relevant to origin activation time. We applied a novel genome-wide strategy to classify origins in the model eukaryote Saccharomyces cerevisiae based on the types of molecular interactions used for ORC-origin binding. Specifically, origins were classified as DNA-dependent when the strength of ORC-origin binding in vivo could be explained by the affinity of ORC for origin DNA in vitro, and, conversely, as ‘chromatin-dependent’ when the ORC-DNA interaction in vitro was insufficient to explain the strength of ORC-origin binding in vivo. These two origin classes differed in terms of nucleosome architecture and dependence on origin-flanking sequences in plasmid replication assays, consistent with local features of chromatin promoting ORC binding at ‘chromatin-dependent’ origins. Finally, the ‘chromatin-dependent’ class was enriched for origins that fire early in S-phase, while the DNA-dependent class was enriched for later firing origins. Conversely, the latest firing origins showed a positive association with the ORC-origin DNA paradigm for normal levels of ORC binding, whereas the earliest firing origins did not. These data reveal a novel association between ORC-origin binding mechanisms and the regulation of origin activation time. PMID:24068963
Johnson, LeeAnn K; Brown, Mary B; Carruthers, Ethan A; Ferguson, John A; Dombek, Priscilla E; Sadowsky, Michael J
2004-08-01
A horizontal, fluorophore-enhanced, repetitive extragenic palindromic-PCR (rep-PCR) DNA fingerprinting technique (HFERP) was developed and evaluated as a means to differentiate human from animal sources of Escherichia coli. Box A1R primers and PCR were used to generate 2,466 rep-PCR and 1,531 HFERP DNA fingerprints from E. coli strains isolated from fecal material from known human and 12 animal sources: dogs, cats, horses, deer, geese, ducks, chickens, turkeys, cows, pigs, goats, and sheep. HFERP DNA fingerprinting reduced within-gel grouping of DNA fingerprints and improved alignment of DNA fingerprints between gels, relative to that achieved using rep-PCR DNA fingerprinting. Jackknife analysis of the complete rep-PCR DNA fingerprint library, done using Pearson's product-moment correlation coefficient, indicated that animal and human isolates were assigned to the correct source groups with an 82.2% average rate of correct classification. However, when only unique isolates were examined, isolates from a single animal having a unique DNA fingerprint, Jackknife analysis showed that isolates were assigned to the correct source groups with a 60.5% average rate of correct classification. The percentages of correctly classified isolates were about 15 and 17% greater for rep-PCR and HFERP, respectively, when analyses were done using the curve-based Pearson's product-moment correlation coefficient, rather than the band-based Jaccard algorithm. Rarefaction analysis indicated that, despite the relatively large size of the known-source database, genetic diversity in E. coli was very great and is most likely accounting for our inability to correctly classify many environmental E. coli isolates. Our data indicate that removal of duplicate genotypes within DNA fingerprint libraries, increased database size, proper methods of statistical analysis, and correct alignment of band data within and between gels improve the accuracy of microbial source tracking methods.
Arnaiz, Olivier; Mathy, Nathalie; Baudry, Céline; Malinsky, Sophie; Aury, Jean-Marc; Denby Wilkes, Cyril; Garnier, Olivier; Labadie, Karine; Lauderdale, Benjamin E; Le Mouël, Anne; Marmignon, Antoine; Nowacki, Mariusz; Poulain, Julie; Prajer, Malgorzata; Wincker, Patrick; Meyer, Eric; Duharcourt, Sandra; Duret, Laurent; Bétermier, Mireille; Sperling, Linda
2012-01-01
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of -45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a -10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated.
Arnaiz, Olivier; Mathy, Nathalie; Baudry, Céline; Malinsky, Sophie; Aury, Jean-Marc; Denby Wilkes, Cyril; Garnier, Olivier; Labadie, Karine; Lauderdale, Benjamin E.; Le Mouël, Anne; Marmignon, Antoine; Nowacki, Mariusz; Poulain, Julie; Prajer, Malgorzata; Wincker, Patrick; Meyer, Eric; Duharcourt, Sandra; Duret, Laurent; Bétermier, Mireille; Sperling, Linda
2012-01-01
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of ∼45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a ∼10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated. PMID:23071448
Pangeni, Rajendra P; Zhang, Zhou; Alvarez, Angel A; Wan, Xuechao; Sastry, Namratha; Lu, Songjian; Shi, Taiping; Huang, Tianzhi; Lei, Charles X; James, C David; Kessler, John A; Brennan, Cameron W; Nakano, Ichiro; Lu, Xinghua; Hu, Bo; Zhang, Wei; Cheng, Shi-Yuan
2018-06-21
Glioma stem cells (GSCs), a subpopulation of tumor cells, contribute to tumor heterogeneity and therapy resistance. Gene expression profiling classified glioblastoma (GBM) and GSCs into four transcriptomically-defined subtypes. Here, we determined the DNA methylation signatures in transcriptomically pre-classified GSC and GBM bulk tumors subtypes. We hypothesized that these DNA methylation signatures correlate with gene expression and are uniquely associated either with only GSCs or only GBM bulk tumors. Additional methylation signatures may be commonly associated with both GSCs and GBM bulk tumors, i.e., common to non-stem-like and stem-like tumor cell populations and correlating with the clinical prognosis of glioma patients. We analyzed Illumina 450K methylation array and expression data from a panel of 23 patient-derived GSCs. We referenced these results with The Cancer Genome Atlas (TCGA) GBM datasets to generate methylomic and transcriptomic signatures for GSCs and GBM bulk tumors of each transcriptomically pre-defined tumor subtype. Survival analyses were carried out for these signature genes using publicly available datasets, including from TCGA. We report that DNA methylation signatures in proneural and mesenchymal tumor subtypes are either unique to GSCs, unique to GBM bulk tumors, or common to both. Further, dysregulated DNA methylation correlates with gene expression and clinical prognoses. Additionally, many previously identified transcriptionally-regulated markers are also dysregulated due to DNA methylation. The subtype-specific DNA methylation signatures described in this study could be useful for refining GBM sub-classification, improving prognostic accuracy, and making therapeutic decisions.
iDBPs: a web server for the identification of DNA binding proteins
Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2010-01-01
Summary: The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies. Availability: http://idbps.tau.ac.il/ Contact: NirB@tauex.tau.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20089514
Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki; Gang, David R.; Sarkanen, Simo; Ford, Joshua D.
2001-04-03
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).
Liang, Jian-Ying; Lin, Rui-Qing
2016-11-01
In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harford, N.; De Wilde, M.
1987-05-19
A recombinant DNA molecule is described comprising at least a portion coding for subunits A and B of cholera toxin, or a fragment or derivative of the portion wherein the fragment or derivative codes for a polypeptide have an activity which can induce an immune response to subunit A; can induce an immune response to subunit A and cause epithelial cell penetration and the enzymatic effect leading to net loss of fluid into the gut lumen; can bind to the membrane receptor for the B subunit of cholera toxin; can induce an immune response to subunit B; can induce anmore » immune response to subunit B and bind to the membrane receptor; or has a combination of the activities.« less
2004-03-16
The Food and Drug Administration (FDA) is classifying the Factor V Leiden deoxyribonucleic acid (DNA) mutation detections systems device into class II (special controls). The special control that will apply to the device is the guidance document entitled "Class II Special Controls Guidance Document: Factor V Leiden DNA Mutation Detection Systems." The agency is taking this action in response to a petition submitted under the Federal Food, Drug, and Cosmetic Act (the act) as amended by the Medical Device Amendments of 1976 (the 1976 amendments), the Safe Medical Devices Act of 1990 (SMDA), the Food and Drug Administration Modernization Act of 1997 (FDAMA), and the Medical Device User Fee and Modernization Act of 2002. The agency is classifying this device into class II (special controls) in order to provide a reasonable assurance of safety and effectiveness of the device. Elsewhere in this issue of the Federal Register, FDA is publishing a notice of availability of a guidance document that is the special control for this device.
Wilson, Sarah E; Deeks, Shelley L; Rosella, Laura C
2015-09-15
In Ontario, Canada, we conducted an evaluation of rotavirus (RV) vaccine on hospitalizations and Emergency Department (ED) visitations for acute gastroenteritis (AGE). In our original analysis, any one of the International Classification of Disease, Version 10 (ICD-10) codes was used for outcome ascertainment: RV-specific- (A08.0), viral- (A08.3, A08. 4, A08.5), and unspecified infectious- gastroenteritis (A09). Annual age-specific rates per 10,000 population were calculated. The average monthly rate of AGE hospitalization for children under age two increased from 0.82 per 10,000 from January 2003 to March 2009, to 2.35 over the period of April 2009 to March 31, 2013. Similar trends were found for ED consultations and in other age groups. A rise in events corresponding to the A09 code was found when the outcome definition was disaggregated by ICD-10 code. Documentation obtained from the World Health Organization confirmed that a change in directive for the classification of unspecified gastroenteritis occurred with the release of ICD-10 in April 2009. AGE events previously classified under the code K52.9, are now classified under code A09.9. Based on change in the classification of unspecified gastroenteritis we modified our outcome definition to also include unspecified non-infectious-gastroenteritis (K52.9). We recommend other investigators consider using both A09.9 and K52.9 ICD-10 codes for outcome ascertainment in future rotavirus vaccine impact studies to ensure that all unspecified cases of AGE are captured, especially if the study period spans 2009.
Cave, John W; Xia, Li; Caudy, Michael
2011-01-01
In Drosophila melanogaster, achaete (ac) and m8 are model basic helix-loop-helix activator (bHLH A) and repressor genes, respectively, that have the opposite cell expression pattern in proneural clusters during Notch signaling. Previous studies have shown that activation of m8 transcription in specific cells within proneural clusters by Notch signaling is programmed by a "combinatorial" and "architectural" DNA transcription code containing binding sites for the Su(H) and proneural bHLH A proteins. Here we show the novel result that the ac promoter contains a similar combinatorial code of Su(H) and bHLH A binding sites but contains a different Su(H) site architectural code that does not mediate activation during Notch signaling, thus programming a cell expression pattern opposite that of m8 in proneural clusters.
DNATCO: assignment of DNA conformers at dnatco.org.
Černý, Jiří; Božíková, Paulína; Schneider, Bohdan
2016-07-08
The web service DNATCO (dnatco.org) classifies local conformations of DNA molecules beyond their traditional sorting to A, B and Z DNA forms. DNATCO provides an interface to robust algorithms assigning conformation classes called NTC: to dinucleotides extracted from DNA-containing structures uploaded in PDB format version 3.1 or above. The assigned dinucleotide NTC: classes are further grouped into DNA structural alphabet NTA: , to the best of our knowledge the first DNA structural alphabet. The results are presented at two levels: in the form of user friendly visualization and analysis of the assignment, and in the form of a downloadable, more detailed table for further analysis offline. The website is free and open to all users and there is no login requirement. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Getting it Right: How DNA Polymerases Select the Right Nucleotide.
Ludmann, Samra; Marx, Andreas
2016-01-01
All living organisms are defined by their genetic code encrypted in their DNA. DNA polymerases are the enzymes that are responsible for all DNA syntheses occurring in nature. For DNA replication, repair and recombination these enzymes have to read the parental DNA and recognize the complementary nucleotide out of a pool of four structurally similar deoxynucleotide triphosphates (dNTPs) for a given template. The selection of the nucleotide is in accordance with the Watson-Crick rule. In this process the accuracy of DNA synthesis is crucial for the maintenance of the genome stability. However, to spur evolution a certain degree of freedom must be allowed. This brief review highlights the mechanistic basis for selecting the right nucleotide by DNA polymerases.
Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A
2004-10-01
The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boore, Jeffrey L.; Medina, Monica; Rosenberg, Lewis A.
2004-01-31
We have determined the complete sequence of the mitochondrial genome of the scaphopod mollusk Graptacme eborea (Conrad, 1846) (14,492 nts) and completed the sequence of the mitochondrial genome of the bivalve mollusk Mytilus edulis Linnaeus, 1758 (16,740 nts). (The name Graptacme eborea is a revision of the species formerly known as Dentalium eboreum.) G. eborea mtDNA contains the 37 genes that are typically found and has the genes divided about evenly between the two strands, but M. edulis contains an extra trnM and is missing atp8, and has all genes on the same strand. Each has a highly rearranged genemore » order relative to each other and to all other studied mtDNAs. G. eborea mtDNA has almost no strand skew, but the coding strand of M. edulis mtDNA is very rich in G and T. This is reflected in differential codon usage patterns and even in amino acid compositions. G. eborea mtDNA has fewer non-coding nucleotides than any other mtDNA studied to date, with the largest non-coding region being only 24 nt long. Phylogenetic analysis using 2,420 aligned amino acid positions of concatenated proteins weakly supports an association of the scaphopod with gastropods to the exclusion of Bivalvia, Cephalopoda, and Polyplacophora, but is generally unable to convincingly resolve the relationships among major groups of the Lophotrochozoa, in contrast to the good resolution seen for several other major metazoan groups.« less
Kangaroo – A pattern-matching program for biological sequences
2002-01-01
Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
Xia, Hui; Li, Lingling; Yin, Zhouyang; Hou, Xiandeng; Zhu, Jun-Jie
2015-01-14
A dual signal amplification strategy for electrochemiluminescence (ECL) aptasensor was designed based on biobar-coded gold nanoparticles (Au NPs) and DNAzyme. CdSeTe@ZnS quantum dots (QDs) were chosen as the ECL signal probes. To verify the proposed ultrasensitive ECL aptasensor for biomolecules, we detected thrombin (Tb) as a proof-of-principle analyte. The hairpin DNA designed for the recognition of protein consists of two parts: the sequences of catalytical 8-17 DNAzyme and thrombin aptamer. Only in the presence of thrombin could the hairpin DNA be opened, followed by a recycling cleavage of excess substrates by catalytic core of the DNAzyme to induce the first-step amplification. One part of the fragments was captured to open the capture DNA modified on the Au electrode, which further connected with the prepared biobar-coded Au NPs-CdSeTe@ZnS QDs to get the final dual-amplified ECL signal. The limit of detection for Tb was 0.28 fM with excellent selectivity, and this proposed method possessed good performance in real sample analysis. This design introduces the new concept of dual-signal amplification by a biobar-coded system and DNAzyme recycling into ECL determination, and it is promising to be extended to provide a highly sensitive platform for various target biomolecules.
Wieczorek, Aneta; Fornalewicz, Karolina; Mocarski, Łukasz; Łyżeń, Robert; Węgrzyn, Grzegorz
2018-04-15
Genetic evidence for a link between DNA replication and glycolysis has been demonstrated a decade ago in Bacillus subtilis, where temperature-sensitive mutations in genes coding for replication proteins could be suppressed by mutations in genes of glycolytic enzymes. Then, a strong influence of dysfunctions of particular enzymes from the central carbon metabolism (CCM) on DNA replication and repair in Escherichia coli was reported. Therefore, we asked if such a link occurs only in bacteria or it is a more general phenomenon. Here, we demonstrate that effects of silencing (provoked by siRNA) of expression of genes coding for proteins involved in DNA replication and repair (primase, DNA polymerase ι, ligase IV, and topoisomerase IIIβ) on these processes (less efficient entry into the S phase of the cell cycle and decreased level of DNA synthesis) could be suppressed by silencing of specific genes of enzymes from CMM. Silencing of other pairs of replication/repair and CMM genes resulted in enhancement of the negative effects of lower expression levels of replication/repair genes. We suggest that these results may be proposed as a genetic evidence for the link between DNA replication/repair and CMM in human cells, indicating that it is a common biological phenomenon, occurring from bacteria to humans. Copyright © 2018 Elsevier B.V. All rights reserved.
Evidence of birth-and-death evolution of 5S rRNA gene in Channa species (Teleostei, Perciformes).
Barman, Anindya Sundar; Singh, Mamta; Singh, Rajeev Kumar; Lal, Kuldeep Kumar
2016-12-01
In higher eukaryotes, minor rDNA family codes for 5S rRNA that is arranged in tandem arrays and comprises of a highly conserved 120 bp long coding sequence with a variable non-transcribed spacer (NTS). Initially the 5S rDNA repeats are considered to be evolved by the process of concerted evolution. But some recent reports, including teleost fishes suggested that evolution of 5S rDNA repeat does not fit into the concerted evolution model and evolution of 5S rDNA family may be explained by a birth-and-death evolution model. In order to study the mode of evolution of 5S rDNA repeats in Perciformes fish species, nucleotide sequence and molecular organization of five species of genus Channa were analyzed in the present study. Molecular analyses revealed several variants of 5S rDNA repeats (four types of NTS) and networks created by a neighbor net algorithm for each type of sequences (I, II, III and IV) did not show a clear clustering in species specific manner. The stable secondary structure is predicted and upstream and downstream conserved regulatory elements were characterized. Sequence analyses also shown the presence of two putative pseudogenes in Channa marulius. Present study supported that 5S rDNA repeats in genus Channa were evolved under the process of birth-and-death.
Design pattern mining using distributed learning automata and DNA sequence alignment.
Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina
2014-01-01
Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Lectin cDNA and transgenic plants derived therefrom
Raikhel, N.V.
1994-01-04
Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties. GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon. .
Lectin cDNA and transgenic plants derived therefrom
Raikhel, Natasha V.
1994-01-04
Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties. GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.
Blochlinger, K; Diggelmann, H
1984-12-01
The DNA coding sequence for the hygromycin B phosphotransferase gene was placed under the control of the regulatory sequences of a cloned long terminal repeat of Moloney sarcoma virus. This construction allowed direct selection for hygromycin B resistance after transfection of eucaryotic cell lines not naturally resistant to this antibiotic, thus providing another dominant marker for DNA transfer in eucaryotic cells.
Blochlinger, K; Diggelmann, H
1984-01-01
The DNA coding sequence for the hygromycin B phosphotransferase gene was placed under the control of the regulatory sequences of a cloned long terminal repeat of Moloney sarcoma virus. This construction allowed direct selection for hygromycin B resistance after transfection of eucaryotic cell lines not naturally resistant to this antibiotic, thus providing another dominant marker for DNA transfer in eucaryotic cells. Images PMID:6098829
Pathogenesis of Chagas' Disease: Parasite Persistence and Autoimmunity
Teixeira, Antonio R. L.; Hecht, Mariana M.; Guimaro, Maria C.; Sousa, Alessandro O.; Nitz, Nadjar
2011-01-01
Summary: Acute Trypanosoma cruzi infections can be asymptomatic, but chronically infected individuals can die of Chagas' disease. The transfer of the parasite mitochondrial kinetoplast DNA (kDNA) minicircle to the genome of chagasic patients can explain the pathogenesis of the disease; in cases of Chagas' disease with evident cardiomyopathy, the kDNA minicircles integrate mainly into retrotransposons at several chromosomes, but the minicircles are also detected in coding regions of genes that regulate cell growth, differentiation, and immune responses. An accurate evaluation of the role played by the genotype alterations in the autoimmune rejection of self-tissues in Chagas' disease is achieved with the cross-kingdom chicken model system, which is refractory to T. cruzi infections. The inoculation of T. cruzi into embryonated eggs prior to incubation generates parasite-free chicks, which retain the kDNA minicircle sequence mainly in the macrochromosome coding genes. Crossbreeding transfers the kDNA mutations to the chicken progeny. The kDNA-mutated chickens develop severe cardiomyopathy in adult life and die of heart failure. The phenotyping of the lesions revealed that cytotoxic CD45, CD8+ γδ, and CD8α+ T lymphocytes carry out the rejection of the chicken heart. These results suggest that the inflammatory cardiomyopathy of Chagas' disease is a genetically driven autoimmune disease. PMID:21734249
Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A
2015-01-01
It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
Campo, Daniel; García-Vázquez, Eva
2012-01-01
The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).
Taylor, Jared F.; Khattab, Omar S.; Chen, Yu-Han; Chen, Yumay; Jacobsen, Steven E.; Wang, Ping H.
2015-01-01
Deciphering the multitude of epigenomic and genomic factors that influence the mutation rate is an area of great interest in modern biology. Recently, chromatin has been shown to play a part in this process. To elucidate this relationship further, we integrated our own ultra-deep sequenced human nucleosomal DNA data set with a host of published human genomic and cancer genomic data sets. Our results revealed, that differences in nucleosome occupancy are associated with changes in base-specific mutation rates. Increasing nucleosome occupancy is associated with an increasing transition to transversion ratio and an increased germline mutation rate within the human genome. Additionally, cancer single nucleotide variants and microindels are enriched within nucleosomes and both the coding and non-coding cancer mutation rate increases with increasing nucleosome occupancy. There is an enrichment of cancer indels at the theoretical start (74 bp) and end (115 bp) of linker DNA between two nucleosomes. We then hypothesized that increasing nucleosome occupancy decreases access to DNA by DNA repair machinery and could account for the increasing mutation rate. Such a relationship should not exist in DNA repair knockouts, and we thus repeated our analysis in DNA repair machinery knockouts to test our hypothesis. Indeed, our results revealed no correlation between increasing nucleosome occupancy and increasing mutation rate in DNA repair knockouts. Our findings emphasize the linkage of the genome and epigenome through the nucleosome whose properties can affect genome evolution and genetic aberrations such as cancer. PMID:26308346
ILIR : SSC San Diego In-House Laboratory Independent Research 2001 Annual Report
2002-05-01
canine distemper virus (CDV) (a morbillivirus closely related to one infecting marine mammals) by intramuscular or intradermal inoculation with a...data.* 3. Sixt, N., A. Cardoso, A. Vallier, J. Fayolle, R. Buckland, T. F. Wild. 1998. “Canine Distemper Virus DNA Vaccination Induces Humoral and...Complementary Code Keying CCSK Cyclic Code Shift Keying CDMA Code Division Multiplexing CDV Canine Distemper Virus CFAR Constant False Alarm
Nanoparticle based bio-bar code technology for trace analysis of aflatoxin B1 in Chinese herbs.
Yu, Yu-Yan; Chen, Yuan-Yuan; Gao, Xuan; Liu, Yuan-Yuan; Zhang, Hong-Yan; Wang, Tong-Ying
2018-04-01
A novel and sensitive assay for aflatoxin B1 (AFB1) detection has been developed by using bio-bar code assay (BCA). The method that relies on polyclonal antibodies encoded with DNA modified gold nanoparticle (NP) and monoclonal antibodies modified magnetic microparticle (MMP), and subsequent detection of amplified target in the form of bio-bar code using a fluorescent quantitative polymerase chain reaction (FQ-PCR) detection method. First, NP probes encoded with DNA that was unique to AFB1, MMP probes with monoclonal antibodies that bind AFB1 specifically were prepared. Then, the MMP-AFB1-NP sandwich compounds were acquired, dehybridization of the oligonucleotides on the nanoparticle surface allows the determination of the presence of AFB1 by identifying the oligonucleotide sequence released from the NP through FQ-PCR detection. The bio-bar code techniques system for detecting AFB1 was established, and the sensitivity limit was about 10 -8 ng/mL, comparable ELISA assays for detecting the same target, it showed that we can detect AFB1 at low attomolar levels with the bio-bar-code amplification approach. This is also the first demonstration of a bio-bar code type assay for the detection of AFB1 in Chinese herbs. Copyright © 2017. Published by Elsevier B.V.
Child Injury Deaths: Comparing Prevention Information from Two Coding Systems
Schnitzer, Patricia G.; Ewigman, Bernard G.
2006-01-01
Objectives The International Classification of Disease (ICD) external cause of injury E-codes do not sufficiently identify injury circumstances amenable to prevention. The researchers developed an alternative classification system (B-codes) that incorporates behavioral and environmental factors, for use in childhood injury research, and compare the two coding systems in this paper. Methods All fatal injuries among children less than age five that occurred between January 1, 1992, and December 31, 1994, were classified using both B-codes and E-codes. Results E-codes identified the most common causes of injury death: homicide (24%), fires (21%), motor vehicle incidents (21%), drowning (10%), and suffocation (9%). The B-codes further revealed that homicides (51%) resulted from the child being shaken or struck by another person; many fires deaths (42%) resulted from children playing with matches or lighters; drownings (46%) usually occurred in natural bodies of water; and most suffocation deaths (68%) occurred in unsafe sleeping arrangements. Conclusions B-codes identify additional information with specific relevance for prevention of childhood injuries. PMID:15944169
Fall Detection Using Smartphone Audio Features.
Cheffena, Michael
2016-07-01
An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k-nearest neighbor classifier (k-NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%. The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Tang, Songsong; Gu, Yuan; Lu, Huiting; Dong, Haifeng; Zhang, Kai; Dai, Wenhao; Meng, Xiangdan; Yang, Fan; Zhang, Xueji
2018-04-03
Herein, a highly-sensitive microRNA (miRNA) detection strategy was developed by combining bio-bar-code assay (BBA) with catalytic hairpin assembly (CHA). In the proposed system, two nanoprobes of magnetic nanoparticles functionalized with DNA probes (MNPs-DNA) and gold nanoparticles with numerous barcode DNA (AuNPs-DNA) were designed. In the presence of target miRNA, the MNP-DNA and AuNP-DNA hybridized with target miRNA to form a "sandwich" structure. After "sandwich" structures were separated from the solution by the magnetic field and dehybridized by high temperature, the barcode DNA sequences were released by dissolving AuNPs. The released barcode DNA sequences triggered the toehold strand displacement assembly of two hairpin probes, leading to recycle of barcode DNA sequences and producing numerous fluorescent CHA products for miRNA detection. Under the optimal experimental conditions, the proposed two-stage amplification system could sensitively detect target miRNA ranging from 10 pM to 10 aM with a limit of detection (LOD) down to 97.9 zM. It displayed good capability to discriminate single base and three bases mismatch due to the unique sandwich structure. Notably, it presented good feasibility for selective multiplexed detection of various combinations of synthetic miRNA sequences and miRNAs extracted from different cell lysates, which were in agreement with the traditional polymerase chain reaction analysis. The two-stage amplification strategy may be significant implication in the biological detection and clinical diagnosis. Copyright © 2017 Elsevier B.V. All rights reserved.
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-01-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. PMID:24792163
A laboratory information management system for DNA barcoding workflows.
Vu, Thuy Duong; Eberhardt, Ursula; Szöke, Szániszló; Groenewald, Marizeth; Robert, Vincent
2012-07-01
This paper presents a laboratory information management system for DNA sequences (LIMS) created and based on the needs of a DNA barcoding project at the CBS-KNAW Fungal Biodiversity Centre (Utrecht, the Netherlands). DNA barcoding is a global initiative for species identification through simple DNA sequence markers. We aim at generating barcode data for all strains (or specimens) included in the collection (currently ca. 80 k). The LIMS has been developed to better manage large amounts of sequence data and to keep track of the whole experimental procedure. The system has allowed us to classify strains more efficiently as the quality of sequence data has improved, and as a result, up-to-date taxonomic names have been given to strains and more accurate correlation analyses have been carried out.
Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer.
Paik, E Sun; Choi, Hyun Jin; Kim, Tae-Joong; Lee, Jeong-Won; Kim, Byoung-Gie; Bae, Duk-Soo; Choi, Chel Hun
2018-04-01
We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients' survival. Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.
Validity of the coding for herpes simplex encephalitis in the Danish National Patient Registry
Jørgensen, Laura Krogh; Dalgaard, Lars Skov; Østergaard, Lars Jørgen; Andersen, Nanna Skaarup; Nørgaard, Mette; Mogensen, Trine Hyrup
2016-01-01
Background Large health care databases are a valuable source of infectious disease epidemiology if diagnoses are valid. The aim of this study was to investigate the accuracy of the recorded diagnosis coding of herpes simplex encephalitis (HSE) in the Danish National Patient Registry (DNPR). Methods The DNPR was used to identify all hospitalized patients, aged ≥15 years, with a first-time diagnosis of HSE according to the International Classification of Diseases, tenth revision (ICD-10), from 2004 to 2014. To validate the coding of HSE, we collected data from the Danish Microbiology Database, from departments of clinical microbiology, and from patient medical records. Cases were classified as confirmed, probable, or no evidence of HSE. We estimated the positive predictive value (PPV) of the HSE diagnosis coding stratified by diagnosis type, study period, and department type. Furthermore, we estimated the proportion of HSE cases coded with nonspecific ICD-10 codes of viral encephalitis and also the sensitivity of the HSE diagnosis coding. Results We were able to validate 398 (94.3%) of the 422 HSE diagnoses identified via the DNPR. Hereof, 202 (50.8%) were classified as confirmed cases and 29 (7.3%) as probable cases providing an overall PPV of 58.0% (95% confidence interval [CI]: 53.0–62.9). For “Encephalitis due to herpes simplex virus” (ICD-10 code B00.4), the PPV was 56.6% (95% CI: 51.1–62.0). Similarly, the PPV for “Meningoencephalitis due to herpes simplex virus” (ICD-10 code B00.4A) was 56.8% (95% CI: 39.5–72.9). “Herpes viral encephalitis” (ICD-10 code G05.1E) had a PPV of 75.9% (95% CI: 56.5–89.7), thereby representing the highest PPV. The estimated sensitivity was 95.5%. Conclusion The PPVs of the ICD-10 diagnosis coding for adult HSE in the DNPR were relatively low. Hence, the DNPR should be used with caution when studying patients with encephalitis caused by herpes simplex virus. PMID:27330328
GESPA: classifying nsSNPs to predict disease association.
Khurana, Jay K; Reeder, Jay E; Shrimpton, Antony E; Thakar, Juilee
2015-07-25
Non-synonymous single nucleotide polymorphisms (nsSNPs) are the most common DNA sequence variation associated with disease in humans. Thus determining the clinical significance of each nsSNP is of great importance. Potential detrimental nsSNPs may be identified by genetic association studies or by functional analysis in the laboratory, both of which are expensive and time consuming. Existing computational methods lack accuracy and features to facilitate nsSNP classification for clinical use. We developed the GESPA (GEnomic Single nucleotide Polymorphism Analyzer) program to predict the pathogenicity and disease phenotype of nsSNPs. GESPA is a user-friendly software package for classifying disease association of nsSNPs. It allows flexibility in acceptable input formats and predicts the pathogenicity of a given nsSNP by assessing the conservation of amino acids in orthologs and paralogs and supplementing this information with data from medical literature. The development and testing of GESPA was performed using the humsavar, ClinVar and humvar datasets. Additionally, GESPA also predicts the disease phenotype associated with a nsSNP with high accuracy, a feature unavailable in existing software. GESPA's overall accuracy exceeds existing computational methods for predicting nsSNP pathogenicity. The usability of GESPA is enhanced by fast SQL-based cloud storage and retrieval of data. GESPA is a novel bioinformatics tool to determine the pathogenicity and phenotypes of nsSNPs. We anticipate that GESPA will become a useful clinical framework for predicting the disease association of nsSNPs. The program, executable jar file, source code, GPL 3.0 license, user guide, and test data with instructions are available at http://sourceforge.net/projects/gespa.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-05-01
Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-01-01
Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions
Kumari, Pooja; Sampath, Karuna
2015-01-01
For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
Kemble, R. J.; Gunn, R. E.; Flavell, R. B.
1980-01-01
Mitochondrial DNA preparations were made from 31 maize lines carrying different sources of cytoplasm in the same nuclear genetic background. The DNAs were analyzed by agarose gel electrophoresis. A number of discrete low molecular weight bands were present in all lines. However, only four different DNA banding patterns were observed. These were correlated with the N, T, S and C cytoplasms defined by nuclear fertility restorer genes. Of the 31 cytoplasmic sources examined, six possessed DNA species characteristic of N cytoplasms, four possessed DNA species characteristic of T cytoplasm, 19 possessed DNA species characteristic of S cytoplasm and two possessed DNA species characteristic of C cytoplasm. This classification is in complete agreement with that based on mitochondrial translation products reported in the accompanying paper. No within-group heterogeneity was observed in the DNA banding patterns, indicating a lack of cytoplasmic variation within the four cytoplasmic groups. Attributes of the various methods available for classifying maize cytoplasms are compared and discussed. PMID:17249046
The Genetic Privacy Act and commentary
DOE Office of Scientific and Technical Information (OSTI.GOV)
Annas, G.J.; Glantz, L.H.; Roche, P.A.
1995-02-28
The Genetic Privacy Act is a proposal for federal legislation. The Act is based on the premise that genetic information is different from other types of personal information in ways that require special protection. The DNA molecule holds an extensive amount of currently indecipherable information. The major goal of the Human Genome Project is to decipher this code so that the information it contains is accessible. The privacy question is, accessible to whom? The highly personal nature of the information contained in DNA can be illustrated by thinking of DNA as containing an individual`s {open_quotes}future diary.{close_quotes} A diary is perhapsmore » the most personal and private document a person can create. It contains a person`s innermost thoughts and perceptions, and is usually hidden and locked to assure its secrecy. Diaries describe the past. The information in one`s genetic code can be thought of as a coded probabilistic future diary because it describes an important part of a unique and personal future. This document presents an introduction to the proposal for federal legislation `the Genetic Privacy Act`; a copy of the proposed act; and comment.« less
NASA Astrophysics Data System (ADS)
Dementev, A. O.; Dmitriev, E. V.; Kozoderov, V. V.; Egorov, V. D.
2017-10-01
Hyperspectral imaging is up-to-date promising technology widely applied for the accurate thematic mapping. The presence of a large number of narrow survey channels allows us to use subtle differences in spectral characteristics of objects and to make a more detailed classification than in the case of using standard multispectral data. The difficulties encountered in the processing of hyperspectral images are usually associated with the redundancy of spectral information which leads to the problem of the curse of dimensionality. Methods currently used for recognizing objects on multispectral and hyperspectral images are usually based on standard base supervised classification algorithms of various complexity. Accuracy of these algorithms can be significantly different depending on considered classification tasks. In this paper we study the performance of ensemble classification methods for the problem of classification of the forest vegetation. Error correcting output codes and boosting are tested on artificial data and real hyperspectral images. It is demonstrates, that boosting gives more significant improvement when used with simple base classifiers. The accuracy in this case in comparable the error correcting output code (ECOC) classifier with Gaussian kernel SVM base algorithm. However the necessity of boosting ECOC with Gaussian kernel SVM is questionable. It is demonstrated, that selected ensemble classifiers allow us to recognize forest species with high enough accuracy which can be compared with ground-based forest inventory data.
Variation in Drug Prices at Pharmacies: Are Prices Higher in Poorer Areas?
Gellad, Walid F; Choudhry, Niteesh K; Friedberg, Mark W; Brookhart, M Alan; Haas, Jennifer S; Shrank, William H
2009-01-01
Objective To determine whether retail prices for prescription drugs are higher in poorer areas. Data Sources The MyFloridarx.com website, which provides retail prescription prices at Florida pharmacies, and median ZIP code income from the 2000 Census. Study Design We compared mean pharmacy prices for each of the four study drugs across ZIP code income groups. Pharmacies were classified as either chain pharmacies or independent pharmacies. Data Collection Prices were downloaded in November 2006. Principal Findings Across the four study drugs, mean prices were highest in the poorest ZIP codes: 9 percent above the statewide average. Independent pharmacies in the poorest ZIP codes charged the highest mean prices. Conclusions Retail prescription prices appear to be higher in poorer ZIP codes of Florida. PMID:19178584
Abras, Alba; Gállego, Montserrat; Muñoz, Carmen; Juiz, Natalia A; Ramírez, Juan Carlos; Cura, Carolina I; Tebar, Silvia; Fernández-Arévalo, Anna; Pinazo, María-Jesús; de la Torre, Leonardo; Posada, Elizabeth; Navarro, Ferran; Espinal, Paula; Ballart, Cristina; Portús, Montserrat; Gascón, Joaquim; Schijman, Alejandro G
2017-04-01
Trypanosoma cruzi, the causative agent of Chagas disease, is divided into six Discrete Typing Units (DTUs): TcI-TcVI. We aimed to identify T. cruzi DTUs in Latin-American migrants in the Barcelona area (Spain) and to assess different molecular typing approaches for the characterization of T. cruzi genotypes. Seventy-five peripheral blood samples were analyzed by two real-time PCR methods (qPCR) based on satellite DNA (SatDNA) and kinetoplastid DNA (kDNA). The 20 samples testing positive in both methods, all belonging to Bolivian individuals, were submitted to DTU characterization using two PCR-based flowcharts: multiplex qPCR using TaqMan probes (MTq-PCR), and conventional PCR. These samples were also studied by sequencing the SatDNA and classified as type I (TcI/III), type II (TcII/IV) and type I/II hybrid (TcV/VI). Ten out of the 20 samples gave positive results in the flowcharts: TcV (5 samples), TcII/V/VI (3) and mixed infections by TcV plus TcII (1) and TcV plus TcII/VI (1). By SatDNA sequencing, we classified the 20 samples, 19 as type I/II and one as type I. The most frequent DTU identified by both flowcharts, and suggested by SatDNA sequencing in the remaining samples with low parasitic loads, TcV, is common in Bolivia and predominant in peripheral blood. The mixed infection by TcV-TcII was detected for the first time simultaneously in Bolivian migrants. PCR-based flowcharts are very useful to characterize DTUs during acute infection. SatDNA sequence analysis cannot discriminate T. cruzi populations at the level of a single DTU but it enabled us to increase the number of characterized cases in chronically infected patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Gomes, S L; Gober, J W; Shapiro, L
1990-01-01
Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Mitochondrial DNA Genetics and the Heteroplasmy Conundrum in Evolution and Disease
Wallace, Douglas C.; Chalkia, Dimitra
2013-01-01
The unorthodox genetics of the mtDNA is providing new perspectives on the etiology of the common “complex” diseases. The maternally inherited mtDNA codes for essential energy genes, is present in thousands of copies per cell, and has a very high mutation rate. New mtDNA mutations arise among thousands of other mtDNAs. The mechanisms by which these “heteroplasmic” mtDNA mutations come to predominate in the female germline and somatic tissues is poorly understood, but essential for understanding the clinical variability of a range of diseases. Maternal inheritance and heteroplasmy also pose major challengers for the diagnosis and prevention of mtDNA disease. PMID:24186072
Simulation of the charge migration in DNA under irradiation with heavy ions.
Belov, Oleg V; Boyda, Denis L; Plante, Ianik; Shirmovsky, Sergey Eh
2015-01-01
A computer model to simulate the processes of charge injection and migration through DNA after irradiation by a heavy charged particle was developed. The most probable sites of charge injection were obtained by merging spatial models of short DNA sequence and a single 1 GeV/u iron particle track simulated by the code RITRACKS (Relativistic Ion Tracks). Charge migration was simulated by using a quantum-classical nonlinear model of the DNA-charge system. It was found that charge migration depends on the environmental conditions. The oxidative damage in DNA occurring during hole migration was simulated concurrently, which allowed the determination of probable locations of radiation-induced DNA lesions.
Sub-10 nm patterning with DNA nanostructures: a short perspective
NASA Astrophysics Data System (ADS)
Du, Ke; Park, Myeongkee; Ding, Junjun; Hu, Huan; Zhang, Zheng
2017-11-01
DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ˜10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either ‘top-down’ or ‘bottom-up’ approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This perspective starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Emphasis is put on the challenges and prospects of DNA nanolithography as the next generation nanomanufacturing technique.
Recominant Pinoresino-Lariciresinol Reductase, Recombinant Dirigent Protein And Methods Of Use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki , Gang; David R. , Sarkanen; Simo , Ford; Joshua D.
2003-10-21
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided from source species Forsythia intermedia, Thuja plicata, Tsuga heterophylla, Eucommia ulmoides, Linum usitatissimum, and Schisandra chinensis, which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leong, JoAnn Ching
The nucleotide sequence of the IHNV glycoprotein gene has been determined from a cDNA clone containing the entire coding region. The glycoprotein cDNA clone contained a leader sequence of 48 bases, a coding region of 1524 nucleotides, and 39 bases at the 3 foot end. The entire cDNA clone contains 1609 nucleodites and encodes a protein of 508 amino acids. The deduced amino acid sequence gave a translated molecular weight of 56,795 daltons. A hydropathicity profile of the deduced amino acid sequence indicated that there were two major hydrophobic domains: one,at the N-terminus,delineating a signal peptide of 18 amino acidsmore » and the other, at the C-terminus,delineating the region of the transmembrane. Five possible sites of N-linked glyscoylation were identified. Although no nucleic acid homology existed between the IHNV glycoprotein gene and the glycoprotein genes of rabies and VSV, there was significant homology at the amino acid level between all three rhabdovirus glycoproteins.« less
Altruistic functions for selfish DNA.
Faulkner, Geoffrey J; Carninci, Piero
2009-09-15
Mammalian genomes are comprised of 30-50% transposed elements (TEs). The vast majority of these TEs are truncated and mutated fragments of retrotransposons that are no longer capable of transposition. Although initially regarded as important factors in the evolution of gene regulatory networks, TEs are now commonly perceived as neutrally evolving and non-functional genomic elements. In a major development, recent works have strongly contradicted this "selfish DNA" or "junk DNA" dogma by demonstrating that TEs use a host of novel promoters to generate RNA on a massive scale across most eukaryotic cells. This transcription frequently functions to control the expression of protein-coding genes via alternative promoters, cis regulatory non protein-coding RNAs and the formation of double stranded short RNAs. If considered in sum, these findings challenge the designation of TEs as selfish and neutrally evolving genomic elements. Here, we will expand upon these themes and discuss challenges in establishing novel TE functions in vivo.
Kobayashi, Shintaro; Yoshii, Kentaro; Hirano, Minato; Muto, Memi; Kariwa, Hiroaki
2017-02-01
Reverse genetics systems facilitate investigation of many aspects of the life cycle and pathogenesis of viruses. However, genetic instability in Escherichia coli has hampered development of a reverse genetics system for West Nile virus (WNV). In this study, we developed a novel reverse genetics system for WNV based on homologous recombination in mammalian cells. Introduction of the DNA fragment coding for the WNV structural protein together with a DNA-based replicon resulted in the release of infectious WNV. The growth rate and plaque size of the recombinant virus were almost identical to those of the parent WNV. Furthermore, chimeric WNV was produced by introducing the DNA fragment coding for the structural protein and replicon plasmid derived from various strains. Here, we report development of a novel system that will facilitate research into WNV infection. Copyright © 2016 Elsevier B.V. All rights reserved.
Biomimetic Artificial Epigenetic Code for Targeted Acetylation of Histones.
Taniguchi, Junichi; Feng, Yihong; Pandian, Ganesh N; Hashiya, Fumitaka; Hidaka, Takuya; Hashiya, Kaori; Park, Soyoung; Bando, Toshikazu; Ito, Shinji; Sugiyama, Hiroshi
2018-06-13
While the central role of locus-specific acetylation of histone proteins in eukaryotic gene expression is well established, the availability of designer tools to regulate acetylation at particular nucleosome sites remains limited. Here, we develop a unique strategy to introduce acetylation by constructing a bifunctional molecule designated Bi-PIP. Bi-PIP has a P300/CBP-selective bromodomain inhibitor (Bi) as a P300/CBP recruiter and a pyrrole-imidazole polyamide (PIP) as a sequence-selective DNA binder. Biochemical assays verified that Bi-PIPs recruit P300 to the nucleosomes having their target DNA sequences and extensively accelerate acetylation. Bi-PIPs also activated transcription of genes that have corresponding cognate DNA sequences inside living cells. Our results demonstrate that Bi-PIPs could act as a synthetic programmable histone code of acetylation, which emulates the bromodomain-mediated natural propagation system of histone acetylation to activate gene expression in a sequence-selective manner.
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong
2012-01-01
Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
The 1985 Army Experience Survey: Tabular Descriptions of First-Term Attritees. Volume 2
1986-01-01
survey receipt control and sample management systems . Data were also keyed, edited, coded, and weighted. The coding schemes developed to classify... R136 REGION OF RESIDENCE WHEN YOU JOINED ARMY. .. ................. 272-273 049 El37 U TERMS OF ACTIVE ENLISTMENT .. ........ ................ 274...272 R136 -- REGION OF RESIDENCE WHEN YOU JOINED ARMY RECODED - WHAT STATE WERE YOU LIVING IN WHEN YOU JOINED THE ARMY! (RECODED TO REGION OF RSID) I
The 1985 Army Experience Survey: Tabular Descriptions of Enlisted Retirees. Volume 2
1986-01-01
and sample management systems . Data were also keyed, edited, coded, and weighted. The coding schemes developed to classify written responses to the...34 ," . ." #".* " ’ ," ." ." ." " ." • --.........-.. ’...................,...".."."’............. . . . . 226 .! - R136 -- REGION OF RESIDENCE WHEN YOU JOINED...standards - Positive or negative 14: Pay/benefits/promotion - Poor Promotion practices/criteria - Point system - Weed out poor soldiers - Could not
15 CFR 734.9 - Educational information.
Code of Federal Regulations, 2012 CFR
2012-01-01
...)). Note that the provisions of this section do not apply to encryption software classified under ECCN... under ECCN 5D002 when the corresponding source code meets the criteria specified in § 740.13(e) of the...
DNA typing in forensic medicine and in criminal investigations: a current survey.
Benecke, M
1997-05-01
Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
DNA typing in forensic medicine and in criminal investigations: a current survey
NASA Astrophysics Data System (ADS)
Benecke, Mark
Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
Structural diversity of supercoiled DNA
Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn
2015-01-01
By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function. PMID:26455586
Structural diversity of supercoiled DNA
NASA Astrophysics Data System (ADS)
Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn
2015-10-01
By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function.
Rhipicephalus microplus strain Deutsch, 10 BAC clone sequences
USDA-ARS?s Scientific Manuscript database
The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. We used labeled DNA probes from the coding reg...
Graphical classification of DNA sequences of HLA alleles by deep learning.
Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi
2018-04-01
Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.
Effect of bar-code technology on the safety of medication administration.
Poon, Eric G; Keohane, Carol A; Yoon, Catherine S; Ditmore, Matthew; Bane, Anne; Levtzion-Korach, Osnat; Moniz, Thomas; Rothschild, Jeffrey M; Kachalia, Allen B; Hayes, Judy; Churchill, William W; Lipsitz, Stuart; Whittemore, Anthony D; Bates, David W; Gandhi, Tejal K
2010-05-06
Serious medication errors are common in hospitals and often occur during order transcription or administration of medication. To help prevent such errors, technology has been developed to verify medications by incorporating bar-code verification technology within an electronic medication-administration system (bar-code eMAR). We conducted a before-and-after, quasi-experimental study in an academic medical center that was implementing the bar-code eMAR. We assessed rates of errors in order transcription and medication administration on units before and after implementation of the bar-code eMAR. Errors that involved early or late administration of medications were classified as timing errors and all others as nontiming errors. Two clinicians reviewed the errors to determine their potential to harm patients and classified those that could be harmful as potential adverse drug events. We observed 14,041 medication administrations and reviewed 3082 order transcriptions. Observers noted 776 nontiming errors in medication administration on units that did not use the bar-code eMAR (an 11.5% error rate) versus 495 such errors on units that did use it (a 6.8% error rate)--a 41.4% relative reduction in errors (P<0.001). The rate of potential adverse drug events (other than those associated with timing errors) fell from 3.1% without the use of the bar-code eMAR to 1.6% with its use, representing a 50.8% relative reduction (P<0.001). The rate of timing errors in medication administration fell by 27.3% (P<0.001), but the rate of potential adverse drug events associated with timing errors did not change significantly. Transcription errors occurred at a rate of 6.1% on units that did not use the bar-code eMAR but were completely eliminated on units that did use it. Use of the bar-code eMAR substantially reduced the rate of errors in order transcription and in medication administration as well as potential adverse drug events, although it did not eliminate such errors. Our data show that the bar-code eMAR is an important intervention to improve medication safety. (ClinicalTrials.gov number, NCT00243373.) 2010 Massachusetts Medical Society
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
USDA-ARS?s Scientific Manuscript database
This chapter describes the ascomycetous yeast genus Naumovozyma, which was recognized from multigene deoxyribonucleic acid (DNA) sequence analysis. The genus has two describes species, which were formerly classified in the genus Saccharomyces. The species reproduce by multilateral budding but do not...
NASA Astrophysics Data System (ADS)
Xu, Kuipeng; Tang, Xianghai; Bi, Guiqi; Cao, Min; Wang, Lu; Mao, Yunxiang
2017-08-01
Pyropia species grow in the intertidal zone and are cold-water adapted. To date, most of the information about the whole plastid and mitochondrial genomes (ptDNA and mtDNA) of this genus is limited to Northern Hemisphere species. Here, we report the sequencing of the ptDNA and mtDNA of the Antarctic red alga Pyropia endiviifolia using the Illumina platform. The plastid genome (195 784 bp, 33.28% GC content) contains 210 protein-coding genes, 37 tRNA genes and 6 rRNA genes. The mitochondrial genome (34 603 bp, 30.5% GC content) contains 26 protein-coding genes, 25 tRNA genes and 2 rRNA genes. Our results suggest that the organellar genomes of Py. endiviifolia have a compact organization. Although the collinearity of these genomes is conserved compared with other Pyropia species, the genome sizes show significant differences, mainly because of the different copy numbers of rDNA operons in the ptDNA and group II introns in the mtDNA. The other Pyropia species have 2u20133 distinct intronic ORFs in their cox 1 genes, but Py. endiviifolia has no introns in its cox 1 gene. This has led to a smaller mtDNA than in other Pyropia species. The phylogenetic relationships within Pyropia were examined using concatenated gene sets from most of the available organellar genomes with both the maximum likelihood and Bayesian methods. The analysis revealed a sister taxa affiliation between the Antarctic species Py. endiviifolia and the North American species Py. kanakaensis.
Chernicky, C L; Tan, H; Burfeind, P; Ilan, J; Ilan, J
1996-02-01
There are several cell types within the placenta that produce cytokines which can contribute to the regulatory mechanisms that ensure normal pregnancy. The immunological milieu at the maternofetal interface is considered to be crucial for survival of the fetus. Interleukin-2 (IL-2) is expressed by the syncytiotrophoblast, the cell layer between the mother and the fetus. IL-2 appears to be a key factor in maintenance of pregnancy. Therefore, it was important to determine the sequence of human placental interleukin-2. Direct sequencing of human placental IL-2 cDNA was determined for the coding region. Subclone sequencing was carried out for the 5'- and 3'-untranslated regions (5'-UTR and 3'-UTR). The 5'-UTR for human placental IL-2 cDNA is 294 bp, which is 247 nucleotides longer than that reported for cDNA IL-2 derived from T cells. The sequence of the coding region is identical to that reported for T cell IL-2, while sequence analysis of the polymerase chain reaction (PCR) product showed that the cDNA from the 3' end was the same as that reported for cDNA from T cells. Human placental IL-2 cDNA is 1,028 base pairs (excluding the poly A tail), which is 247 bp longer at the 5' end than that reported for IL-2 T cell cDNA. Therefore, the extended 5'-UTR of the placental IL-2 cDNA may be a consequence of alternative promoter utilization in the placenta.
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong
2015-09-01
The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Penlington, M C; Williams, M A; Sumpter, J P; Rand-Weaver, M; Hoole, D; Arme, C
1997-12-01
The complementary DNAs (cDNA) encoding the [Trp7,Leu8]-gonadotrophin-releasing hormone (salmon-type GnRH; sGnRH:GeneBank accession no. u60667) and the [His5,Trp7,Tyr8]-GnRH (chicken-II-type GnRH; cGnRH-II: GeneBank accession no. u60668) precursor in the roach (Rutilus rutilus) were isolated and sequenced following reverse transcription and rapid amplification of cDNA ends (RACE). The sGnRH and cGnRH-II precursor cDNAs consisted of 439 and 628 bp, and included open reading frames of 282 and 255 bp respectively. The structures of the encoded peptides were the same as GnRHs previously identified in other vertebrates. The sGnRH and cGnRH-II precursor cDNAs, including the non-coding regions, had 88.6 and 79.9% identity respectively, to those identified in goldfish (Carassius auratus). However, significant similarity was not observed between the non-coding regions of the GnRH cDNAs of Cyprinidae and other fish. The presumed third exon, encoding partial sGnRH associated peptide (GAP) of roach, demonstrated significant nucleotide and amino acid similarity with the appropriate regions in the goldfish, but not with other species, and this may indicate functional differences of GAP between different families of fish. cGnRH-II precursor cDNAs from roach had relatively high nucleotide similarity across this GnRH variant. Cladistic analysis classified the sGnRH and cGnRH-II precursor cDNAs into three and two groups respectively. However, the divergence between nucleotide sequences within the sGnRH variant was greater than those encoding the cGnRH-II precursors. Consistent with the consensus developed from previous studies, Northern blot analysis demonstrated that expression of sGnRH and cGnRH-II was restricted to the olfactory bulbs and midbrain of roach respectively. This work forms the basis for further study on the mechanisms by which the tapeworm, Ligula intestinalis, interacts with the pituitary-gonadal axis of its fish host.
I-Ching, dyadic groups of binary numbers and the geno-logic coding in living bodies.
Hu, Zhengbing; Petoukhov, Sergey V; Petukhova, Elena S
2017-12-01
The ancient Chinese book I-Ching was written a few thousand years ago. It introduces the system of symbols Yin and Yang (equivalents of 0 and 1). It had a powerful impact on culture, medicine and science of ancient China and several other countries. From the modern standpoint, I-Ching declares the importance of dyadic groups of binary numbers for the Nature. The system of I-Ching is represented by the tables with dyadic groups of 4 bigrams, 8 trigrams and 64 hexagrams, which were declared as fundamental archetypes of the Nature. The ancient Chinese did not know about the genetic code of protein sequences of amino acids but this code is organized in accordance with the I-Ching: in particularly, the genetic code is constructed on DNA molecules using 4 nitrogenous bases, 16 doublets, and 64 triplets. The article also describes the usage of dyadic groups as a foundation of the bio-mathematical doctrine of the geno-logic code, which exists in parallel with the known genetic code of amino acids but serves for a different goal: to code the inherited algorithmic processes using the logical holography and the spectral logic of systems of genetic Boolean functions. Some relations of this doctrine with the I-Ching are discussed. In addition, the ratios of musical harmony that can be revealed in the parameters of DNA structure are also represented in the I-Ching book. Copyright © 2017 Elsevier Ltd. All rights reserved.
Epigenetic Patterns of PTSD: DNA Methylation in Serum of OIF/OEF Service Members
2009-03-01
DNA methylation patterns in cytokines in soldiers prior to OIF or OEF deployment; serum derived DNA is being used. PTSD cases with existing serum...having an outpatient record with a primary diagnosis of PTSD, based on ICD-9 codes; an appropriate control group was identified. For each PTSD case ... cases and controls and between pre- and post-deployments of each group. We will also measure levels of these specific cytokines using an ELISA
NASA Astrophysics Data System (ADS)
Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong
2015-10-01
Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.
Zhuo, L; Reed, K M; Phillips, R B
1995-06-01
Variation in the intergenic spacer (IGS) of the ribosomal DNA (rDNA) of lake trout (Salvelinus namaycush) was examined. Digestion of genomic DNA with restriction enzymes showed that almost every individual had a unique combination of length variants with most of this variation occurring within rather than between populations. Sequence analysis of a 2.3 kilobase (kb) EcoRI-DraI fragment spanning the 3' end of the 28S coding region and approximately 1.8 kb of the IGS revealed two blocks of repetitive DNA. Putative transcriptional termination sites were found approximately 220 bases (b) downstream from the end of the 28S coding region. Comparison of the 2.3-kb fragments with two longer (3.1 kb) fragments showed that the major difference in length resulted from variation in the number of short (89 b) repeats located 3' to the putative terminator. Repeat units within a single nucleolus organizer region (NOR) appeared relatively homogeneous and genetic analysis found variants to be stably inherited. A comparison of the number of spacer-length variants with the number of NORs found that the number of length variants per individual was always less than the number of NORs. Examination of spacer variants in five populations showed that populations with more NORs had more spacer variants, indicating that variants are present at different rDNA sites on nonhomologous chromosomes.
Fu, Cheng-Jie; Sheikh, Sanea; Miao, Wei; Andersson, Siv G E; Baldauf, Sandra L
2014-08-21
Discoba (Excavata) is an ancient group of eukaryotes with great morphological and ecological diversity. Unlike the other major divisions of Discoba (Jakobida and Euglenozoa), little is known about the mitochondrial DNAs (mtDNAs) of Heterolobosea. We have assembled a complete mtDNA genome from the aggregating heterolobosean amoeba, Acrasis kona, which consists of a single circular highly AT-rich (83.3%) molecule of 51.5 kb. Unexpectedly, A. kona mtDNA is missing roughly 40% of the protein-coding genes and nearly half of the transfer RNAs found in the only other sequenced heterolobosean mtDNAs, those of Naegleria spp. Instead, over a quarter of A. kona mtDNA consists of novel open reading frames. Eleven of the 16 protein-coding genes missing from A. kona mtDNA were identified in its nuclear DNA and polyA RNA, and phylogenetic analyses indicate that at least 10 of these 11 putative nuclear-encoded mitochondrial (NcMt) proteins arose by direct transfer from the mitochondrion. Acrasis kona mtDNA also employs C-to-U type RNA editing, and 12 homologs of DYW-type pentatricopeptide repeat (PPR) proteins implicated in plant organellar RNA editing are found in A. kona nuclear DNA. A mapping of mitochondrial gene content onto a consensus phylogeny reveals a sporadic pattern of relative stasis and rampant gene loss in Discoba. Rampant loss occurred independently in the unique common lineage leading to Heterolobosea + Tsukubamonadida and later in the unique lineage leading to Acrasis. Meanwhile, mtDNA gene content appears to be remarkably stable in the Acrasis sister lineage leading to Naegleria and in their distant relatives Jakobida. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Michalaki, M; Oulis, C J; Pandis, N; Eliades, G
2016-12-01
This in vitro study was to classify questionable for caries occlusal surfaces (QCOS) of permanent teeth according to ICDAS codes 1, 2, and 3 and to compare them in terms of enamel mineral composition with the areas of sound tissue of the same tooth. Partially impacted human molars (60) extracted for therapeutic reasons with QCOS were used in the study, photographed via a polarised light microscope and classified according to the ICDAS II (into codes 1, 2, or 3). The crowns were embedded in clear self-cured acrylic resin and longitudinally sectioned at the levels of the characterised lesions and studied by SEM/EDX, to assess enamel mineral composition of the QCOS. Univariate and multivariate random effect regressions were used for Ca (wt%), P (wt%), and Ca/P (wt%). The EDX analysis indicated changes in the Ca and P contents that were more prominent in ICDAS-II code 3 lesions compared to codes 1 and 2 lesions. In these lesions, Ca (wt%) and P (wt%) concentrations were significantly decreased (p = 0.01) in comparison with sound areas. Ca and P (wt%) contents were significantly lower (p = 0.02 and p = 0.01 respectively) for code 3 areas in comparison with codes 1 and 2 areas. Significantly higher (p = 0.01) Ca (wt%) and P (wt%) contents were found on sound areas compared to the lesion areas. The enamel of occlusal surfaces of permanent teeth with ICDAS 1, 2, and 3 lesions was found to have different Ca/P compositions, necessitating further investigation on whether these altered surfaces might behave differently on etching preparation before fissure sealant placement, compared to sound surfaces.
Chevret, P; Denys, C; Jaeger, J J; Michaux, J; Catzeflis, F M
1993-01-01
Spiny mice of the genus Acomys traditionally have been classified as members of the Murinae, a subfamily of rodents that also includes rats and mice with which spiny mice share a complex set of morphological characters, including a unique molar pattern. The origin and evolution of this molar pattern, documented by many fossils from Southern Asia, support the hypothesis of the monophyly of Acomys and all other Murinae. This view has been challenged by immunological studies that have suggested that Acomys is as distantly related to mice (Mus) as are other subfamilies (e.g., hamsters: Cricetinae) of the muroid rodents. We present molecular evidence derived from DNA.DNA hybridization data that indicate that the spiny mouse Acomys and two African genera of Murinae, Uranomys and Lophuromys, constitute a monophyletic clade, a view that was recently suggested on the basis of dental characters. However, our DNA.DNA hybridization data also indicate that the spiny mice (Acomys) are more closely related to gerbils (Gerbillinae) than to the true mice and rats (Murinae) with which they have been classified. Because Acomys and the brush-furred mice Uranomys and Lophuromys share no derived morphological characters with the Gerbillinae, their murine morphology must have evolved by convergence, including the molar pattern previously considered to support the monophyly of the Murinae. PMID:8475093
Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger
Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui
2010-01-01
In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376
Emerging criteria for the low-coherence cannot classify category.
Speranza, Anna Maria; Nicolais, Giampaolo; Maggiora Vergano, Carola; Dazzi, Nino
2017-12-01
As suggested by Main et al., to respond to the need for an adaptation of the existing Adult Attachment Interview (AAI) coding system, especially regarding the application to nonnormative samples, this study presents additional criteria that characterize the low-coherence cannot classify (CC) category. Three AAIs were selected from a sample of parents of maltreated children. All transcripts indicated a very low coherence, with no evidence of contradictory insecure discourse strategies. Moreover, global category descriptors were identified, together with specific indices of discourse characteristics and features that highlight the breakdown in reasoning and discourse experienced by the speakers. The aim of the study is to illustrate new criteria to identify and rate a low-coherence CC profile toward the operationalization of this pervasively unintegrated state of mind. Through the definition of additional criteria for low-coherence CC category, our study helps the AAI and its coding system be more flexible and effective when dealing with clinical samples.
Surkis, Alisa; Hogle, Janice A; DiazGranados, Deborah; Hunt, Joe D; Mazmanian, Paul E; Connors, Emily; Westaby, Kate; Whipple, Elizabeth C; Adamus, Trisha; Mueller, Meridith; Aphinyanaphongs, Yindalon
2016-08-05
Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications. Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier. The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves (AUC), with an AUC of 0.94 for the T0 classifier, 0.84 for T1/T2, and 0.92 for T3/T4. The combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.
Combining multiple decisions: applications to bioinformatics
NASA Astrophysics Data System (ADS)
Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.
2008-01-01
Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
Kazlauskas, Darius; Krupovic, Mart; Venclovas, Česlovas
2016-01-01
Abstract Genomic DNA replication is a complex process that involves multiple proteins. Cellular DNA replication systems are broadly classified into only two types, bacterial and archaeo-eukaryotic. In contrast, double-stranded (ds) DNA viruses feature a much broader diversity of DNA replication machineries. Viruses differ greatly in both completeness and composition of their sets of DNA replication proteins. In this study, we explored whether there are common patterns underlying this extreme diversity. We identified and analyzed all major functional groups of DNA replication proteins in all available proteomes of dsDNA viruses. Our results show that some proteins are common to viruses infecting all domains of life and likely represent components of the ancestral core set. These include B-family polymerases, SF3 helicases, archaeo-eukaryotic primases, clamps and clamp loaders of the archaeo-eukaryotic type, RNase H and ATP-dependent DNA ligases. We also discovered a clear correlation between genome size and self-sufficiency of viral DNA replication, the unanticipated dominance of replicative helicases and pervasive functional associations among certain groups of DNA replication proteins. Altogether, our results provide a comprehensive view on the diversity and evolution of replication systems in the DNA virome and uncover fundamental principles underlying the orchestration of viral DNA replication. PMID:27112572
Dominguez, Daniel; Tsai, Yi-Hsuan; Gomez, Nicholas; Jha, Deepak Kumar; Davis, Ian; Wang, Zefeng
2016-01-01
Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes. PMID:27364684
Farzandipour, Mehrdad; Sheikhtaheri, Abbas
2009-01-01
To evaluate the accuracy of procedural coding and the factors that influence it, 246 records were randomly selected from four teaching hospitals in Kashan, Iran. “Recodes” were assigned blindly and then compared to the original codes. Furthermore, the coders' professional behaviors were carefully observed during the coding process. Coding errors were classified as major or minor. The relations between coding accuracy and possible effective factors were analyzed by χ2 or Fisher exact tests as well as the odds ratio (OR) and the 95 percent confidence interval for the OR. The results showed that using a tabular index for rechecking codes reduces errors (83 percent vs. 72 percent accuracy). Further, more thorough documentation by the clinician positively affected coding accuracy, though this relation was not significant. Readability of records decreased errors overall (p = .003), including major ones (p = .012). Moreover, records with no abbreviations had fewer major errors (p = .021). In conclusion, not using abbreviations, ensuring more readable documentation, and paying more attention to available information increased coding accuracy and the quality of procedure databases. PMID:19471647
Generalized query-based active learning to identify differentially methylated regions in DNA.
Haque, Md Muksitul; Holder, Lawrence B; Skinner, Michael K; Cook, Diane J
2013-01-01
Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.
Kavuluru, Ramakanth; Rios, Anthony; Lu, Yuan
2015-01-01
Background Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. Objective We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. Methods We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011–2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. Results Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. Conclusions We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step. PMID:26054428