Sample records for classifying coding dna

  1. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  2. Superimposed Code Theoretic Analysis of DNA Codes and DNA Computing

    DTIC Science & Technology

    2008-01-01

    complements of one another and the DNA duplex formed is a Watson - Crick (WC) duplex. However, there are many instances when the formation of non-WC...that the user’s requirements for probe selection are met based on the Watson - Crick probe locality within a target. The second type, called...AFRL-RI-RS-TR-2007-288 Final Technical Report January 2008 SUPERIMPOSED CODE THEORETIC ANALYSIS OF DNA CODES AND DNA COMPUTING

  3. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  4. DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

    PubMed Central

    Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  5. DNA: Polymer and molecular code

    NASA Astrophysics Data System (ADS)

    Shivashankar, G. V.

    1999-10-01

    The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes

  6. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  7. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  8. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data

    PubMed Central

    Yang, Yang; Niehaus, Katherine E; Walker, Timothy M; Iqbal, Zamin; Walker, A Sarah; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W; Smith, E Grace; Zhu, Tingting; Clifton, David A

    2018-01-01

    Abstract Motivation Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Summary Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Results Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4–8% for other drugs (P < 0.01). Availability and implementation The details of source code are provided at http://www.robots.ox.ac.uk/~davidc/code.php. Contact david.clifton@eng.ox.ac.uk Supplementary information Supplementary data are available at Bioinformatics online. PMID:29240876

  9. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data.

    PubMed

    Yang, Yang; Niehaus, Katherine E; Walker, Timothy M; Iqbal, Zamin; Walker, A Sarah; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W; Smith, E Grace; Zhu, Tingting; Clifton, David A

    2018-05-15

    Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4-8% for other drugs (P < 0.01). The details of source code are provided at http://www.robots.ox.ac.uk/~davidc/code.php. david.clifton@eng.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  10. ANN modeling of DNA sequences: new strategies using DNA shape code.

    PubMed

    Parbhane, R V; Tambe, S S; Kulkarni, B D

    2000-09-01

    Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.

  11. DNA rearrangements directed by non-coding RNAs in ciliates

    PubMed Central

    Mochizuki, Kazufumi

    2013-01-01

    Extensive programmed rearrangement of DNA, including DNA elimination, chromosome fragmentation, and DNA descrambling, takes place in the newly developed macronucleus during the sexual reproduction of ciliated protozoa. Recent studies have revealed that two distant classes of ciliates use distinct types of non-coding RNAs to regulate such DNA rearrangement events. DNA elimination in Tetrahymena is regulated by small non-coding RNAs that are produced and utilized in an RNAi-related process. It has been proposed that the small RNAs produced from the micronuclear genome are used to identify eliminated DNA sequences by whole-genome comparison between the parental macronucleus and the micronucleus. In contrast, DNA descrambling in Oxytricha is guided by long non-coding RNAs that are produced from the parental macronuclear genome. These long RNAs are proposed to act as templates for the direct descrambling events that occur in the developing macronucleus. Both cases provide useful examples to study epigenetic chromatin regulation by non-coding RNAs. PMID:21956937

  12. nRC: non-coding RNA Classifier based on structural features.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso

    2017-01-01

    Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

  13. DNA codes for nanoscience.

    PubMed

    Samorì, Bruno; Zuccheri, Giampaolo

    2005-02-11

    The nanometer scale is a special place where all sciences meet and develop a particularly strong interdisciplinarity. While biology is a source of inspiration for nanoscientists, chemistry has a central role in turning inspirations and methods from biological systems to nanotechnological use. DNA is the biological molecule by which nanoscience and nanotechnology is mostly fascinated. Nature uses DNA not only as a repository of the genetic information, but also as a controller of the expression of the genes it contains. Thus, there are codes embedded in the DNA sequence that serve to control recognition processes on the atomic scale, such as the base pairing, and others that control processes taking place on the nanoscale. From the chemical point of view, DNA is the supramolecular building block with the highest informational content. Nanoscience has therefore the opportunity of using DNA molecules to increase the level of complexity and efficiency in self-assembling and self-directing processes.

  14. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.

  15. Security authentication with a three-dimensional optical phase code using random forest classifier: an overview

    NASA Astrophysics Data System (ADS)

    Markman, Adam; Carnicer, Artur; Javidi, Bahram

    2017-05-01

    We overview our recent work [1] on utilizing three-dimensional (3D) optical phase codes for object authentication using the random forest classifier. A simple 3D optical phase code (OPC) is generated by combining multiple diffusers and glass slides. This tag is then placed on a quick-response (QR) code, which is a barcode capable of storing information and can be scanned under non-uniform illumination conditions, rotation, and slight degradation. A coherent light source illuminates the OPC and the transmitted light is captured by a CCD to record the unique signature. Feature extraction on the signature is performed and inputted into a pre-trained random-forest classifier for authentication.

  16. On fuzzy semantic similarity measure for DNA coding.

    PubMed

    Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin

    2016-02-01

    A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

    PubMed Central

    Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

    2005-01-01

    We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308

  18. Indications for spine surgery: validation of an administrative coding algorithm to classify degenerative diagnoses

    PubMed Central

    Lurie, Jon D.; Tosteson, Anna N.A.; Deyo, Richard A.; Tosteson, Tor; Weinstein, James; Mirza, Sohail K.

    2014-01-01

    Study Design Retrospective analysis of Medicare claims linked to a multi-center clinical trial. Objective The Spine Patient Outcomes Research Trial (SPORT) provided a unique opportunity to examine the validity of a claims-based algorithm for grouping patients by surgical indication. SPORT enrolled patients for lumbar disc herniation, spinal stenosis, and degenerative spondylolisthesis. We compared the surgical indication derived from Medicare claims to that provided by SPORT surgeons, the “gold standard”. Summary of Background Data Administrative data are frequently used to report procedure rates, surgical safety outcomes, and costs in the management of spinal surgery. However, the accuracy of using diagnosis codes to classify patients by surgical indication has not been examined. Methods Medicare claims were link to beneficiaries enrolled in SPORT. The sensitivity and specificity of three claims-based approaches to group patients based on surgical indications were examined: 1) using the first listed diagnosis; 2) using all diagnoses independently; and 3) using a diagnosis hierarchy based on the support for fusion surgery. Results Medicare claims were obtained from 376 SPORT participants, including 21 with disc herniation, 183 with spinal stenosis, and 172 with degenerative spondylolisthesis. The hierarchical coding algorithm was the most accurate approach for classifying patients by surgical indication, with sensitivities of 76.2%, 88.1%, and 84.3% for disc herniation, spinal stenosis, and degenerative spondylolisthesis cohorts, respectively. The specificity was 98.3% for disc herniation, 83.2% for spinal stenosis, and 90.7% for degenerative spondylolisthesis. Misclassifications were primarily due to codes attributing more complex pathology to the case. Conclusion Standardized approaches for using claims data to accurately group patients by surgical indications has widespread interest. We found that a hierarchical coding approach correctly classified over 90

  19. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  20. Functional interrogation of non-coding DNA through CRISPR genome editing.

    PubMed

    Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

    2017-05-15

    Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Functional interrogation of non-coding DNA through CRISPR genome editing

    PubMed Central

    Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

    2017-01-01

    Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828

  2. Identifying aggressive prostate cancer foci using a DNA methylation classifier.

    PubMed

    Mundbjerg, Kamilla; Chopra, Sameer; Alemozaffar, Mehrdad; Duymich, Christopher; Lakshminarasimhan, Ranjani; Nichols, Peter W; Aron, Manju; Siegmund, Kimberly D; Ukimura, Osamu; Aron, Monish; Stern, Mariana; Gill, Parkash; Carpten, John D; Ørntoft, Torben F; Sørensen, Karina D; Weisenberger, Daniel J; Jones, Peter A; Duddalwar, Vinay; Gill, Inderbir; Liang, Gangning

    2017-01-12

    Slow-growing prostate cancer (PC) can be aggressive in a subset of cases. Therefore, prognostic tools to guide clinical decision-making and avoid overtreatment of indolent PC and undertreatment of aggressive disease are urgently needed. PC has a propensity to be multifocal with several different cancerous foci per gland. Here, we have taken advantage of the multifocal propensity of PC and categorized aggressiveness of individual PC foci based on DNA methylation patterns in primary PC foci and matched lymph node metastases. In a set of 14 patients, we demonstrate that over half of the cases have multiple epigenetically distinct subclones and determine the primary subclone from which the metastatic lesion(s) originated. Furthermore, we develop an aggressiveness classifier consisting of 25 DNA methylation probes to determine aggressive and non-aggressive subclones. Upon validation of the classifier in an independent cohort, the predicted aggressive tumors are significantly associated with the presence of lymph node metastases and invasive tumor stages. Overall, this study provides molecular-based support for determining PC aggressiveness with the potential to impact clinical decision-making, such as targeted biopsy approaches for early diagnosis and active surveillance, in addition to focal therapy.

  3. Urine cell-based DNA methylation classifier for monitoring bladder cancer.

    PubMed

    van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio

    2018-01-01

    Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N  = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N  = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.

  4. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  5. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find

  6. Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

    PubMed Central

    2012-01-01

    Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225

  7. Prognostic Classifier Based on Genome-Wide DNA Methylation Profiling in Well-Differentiated Thyroid Tumors.

    PubMed

    Bisarro Dos Reis, Mariana; Barros-Filho, Mateus Camargo; Marchi, Fábio Albuquerque; Beltrami, Caroline Moraes; Kuasne, Hellen; Pinto, Clóvis Antônio Lopes; Ambatipudi, Srikant; Herceg, Zdenko; Kowalski, Luiz Paulo; Rogatto, Silvia Regina

    2017-11-01

    Even though the majority of well-differentiated thyroid carcinoma (WDTC) is indolent, a number of cases display an aggressive behavior. Cumulative evidence suggests that the deregulation of DNA methylation has the potential to point out molecular markers associated with worse prognosis. To identify a prognostic epigenetic signature in thyroid cancer. Genome-wide DNA methylation assays (450k platform, Illumina) were performed in a cohort of 50 nonneoplastic thyroid tissues (NTs), 17 benign thyroid lesions (BTLs), and 74 thyroid carcinomas (60 papillary, 8 follicular, 2 Hürthle cell, 1 poorly differentiated, and 3 anaplastic). A prognostic classifier for WDTC was developed via diagonal linear discriminant analysis. The results were compared with The Cancer Genome Atlas (TCGA) database. A specific epigenetic profile was detected according to each histological subtype. BTLs and follicular carcinomas showed a greater number of methylated CpG in comparison with NTs, whereas hypomethylation was predominant in papillary and undifferentiated carcinomas. A prognostic classifier based on 21 DNA methylation probes was able to predict poor outcome in patients with WDTC (sensitivity 63%, specificity 92% for internal data; sensitivity 64%, specificity 88% for TCGA data). High-risk score based on the classifier was considered an independent factor of poor outcome (Cox regression, P < 0.001). The methylation profile of thyroid lesions exhibited a specific signature according to the histological subtype. A meaningful algorithm composed of 21 probes was capable of predicting the recurrence in WDTC. Copyright © 2017 Endocrine Society

  8. Privacy rules for DNA databanks. Protecting coded 'future diaries'.

    PubMed

    Annas, G J

    1993-11-17

    In privacy terms, genetic information is like medical information. But the information contained in the DNA molecule itself is more sensitive because it contains an individual's probabilistic "future diary," is written in a code that has only partially been broken, and contains information about an individual's parents, siblings, and children. Current rules for protecting the privacy of medical information cannot protect either genetic information or identifiable DNA samples stored in DNA databanks. A review of the legal and public policy rationales for protecting genetic privacy suggests that specific enforceable privacy rules for DNA databanks are needed. Four preliminary rules are proposed to govern the creation of DNA databanks, the collection of DNA samples for storage, limits on the use of information derived from the samples, and continuing obligations to those whose DNA samples are in the databanks.

  9. HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

    PubMed Central

    Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392

  10. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco

    2011-01-01

    Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.

  11. Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides.

    PubMed

    Martín-Navarro, Antonio; Gaudioso-Simón, Andrés; Álvarez-Jarreta, Jorge; Montoya, Julio; Mayordomo, Elvira; Ruiz-Pesini, Eduardo

    2017-03-07

    Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved. We have developed a SVM classifier (Mitoclass.1) specific for mtDNA missense variants. Training and validation of the model was executed with 2,835 mtDNA damaging and neutral amino acid substitutions, previously curated by a set of rigorous pathogenicity criteria with high specificity. Each instance is described by a set of three attributes based on evolutionary conservation in Eukaryota of wildtype and mutant amino acids as well as coevolution and a novel evolutionary analysis of specific substitutions belonging to the same domain of mitochondrial polypeptides. Our classifier has performed better than other web-available tested predictors. We checked performance of three broadly used predictors with the total mutations of our curated dataset. PolyPhen-2 showed the best results for a screening proposal with a good sensitivity. Nevertheless, the number of false positive predictions was too high. Our method has an improved sensitivity and better specificity in relation to PolyPhen-2. We also publish predictions for the complete set of 24,201 possible missense variants in the 13 human mtDNA-encoded polypeptides. Mitoclass.1 allows a better selection of candidate damaging missense variants from mtDNA. A careful search of discriminatory attributes and a training step based on a curated dataset of amino acid substitutions belonging exclusively to human mtDNA genes allows an improved

  12. An Integrated Prognostic Classifier for Stage I Lung Adenocarcinoma based on mRNA, microRNA and DNA Methylation Biomarkers

    PubMed Central

    Robles, Ana I.; Arai, Eri; Mathé, Ewy A.; Okayama, Hirokazu; Schetter, Aaron J.; Brown, Derek; Petersen, David; Bowman, Elise D.; Noro, Rintaro; Welsh, Judith A.; Edelman, Daniel C.; Stevenson, Holly S.; Wang, Yonghong; Tsuchiya, Naoto; Kohno, Takashi; Skaug, Vidar; Mollerup, Steen; Haugen, Aage; Meltzer, Paul S.; Yokota, Jun; Kanai, Yae

    2015-01-01

    Introduction Up to 30% Stage I lung cancer patients suffer recurrence within 5 years of curative surgery. We sought to improve existing protein-coding gene and microRNA expression prognostic classifiers by incorporating epigenetic biomarkers. Methods Genome-wide screening of DNA methylation and pyrosequencing analysis of HOXA9 promoter methylation were performed in two independently collected cohorts of Stage I lung adenocarcinoma. The prognostic value of HOXA9 promoter methylation alone and in combination with mRNA and miRNA biomarkers was assessed by Cox regression and Kaplan-Meier survival analysis in both cohorts. Results Promoters of genes marked by Polycomb in Embryonic Stem Cells were methylated de novo in tumors and identified patients with poor prognosis. The HOXA9 locus was methylated de novo in Stage I tumors (P < 0.0005). High HOXA9 promoter methylation was associated with worse cancer-specific survival (Hazard Ratio [HR], 2.6; P = 0.02) and recurrence-free survival (HR, 3.0; P = 0.01), and identified high-risk patients in stratified analysis of Stage IA and IB. Four protein-coding gene (XPO1, BRCA1, HIF1α, DLC1), miR-21 expression and HOXA9 promoter methylation were each independently associated with outcome (HR, 2.8; P = 0.002; HR, 2.3; P = 0.01; and HR, 2.4; P = 0.005, respectively), and, when combined, identified high-risk, therapy naïve, Stage I patients (HR, 10.2; P = 3x10−5). All associations were confirmed in two independently collected cohorts. Conclusion A prognostic classifier comprising three types of genomic and epigenomic data may help guide the postoperative management of Stage I lung cancer patients at high risk of recurrence. PMID:26134223

  13. DNA methylation of miRNA coding sequences putatively associated with childhood obesity.

    PubMed

    Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A

    2017-02-01

    Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.

  14. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  15. Quartz crystal microbalance detection of DNA single-base mutation based on monobase-coded cadmium tellurium nanoprobe.

    PubMed

    Zhang, Yuqin; Lin, Fanbo; Zhang, Youyu; Li, Haitao; Zeng, Yue; Tang, Hao; Yao, Shouzhuo

    2011-01-01

    A new method for the detection of point mutation in DNA based on the monobase-coded cadmium tellurium nanoprobes and the quartz crystal microbalance (QCM) technique was reported. A point mutation (single-base, adenine, thymine, cytosine, and guanine, namely, A, T, C and G, mutation in DNA strand, respectively) DNA QCM sensor was fabricated by immobilizing single-base mutation DNA modified magnetic beads onto the electrode surface with an external magnetic field near the electrode. The DNA-modified magnetic beads were obtained from the biotin-avidin affinity reaction of biotinylated DNA and streptavidin-functionalized core/shell Fe(3)O(4)/Au magnetic nanoparticles, followed by a DNA hybridization reaction. Single-base coded CdTe nanoprobes (A-CdTe, T-CdTe, C-CdTe and G-CdTe, respectively) were used as the detection probes. The mutation site in DNA was distinguished by detecting the decreases of the resonance frequency of the piezoelectric quartz crystal when the coded nanoprobe was added to the test system. This proposed detection strategy for point mutation in DNA is proved to be sensitive, simple, repeatable and low-cost, consequently, it has a great potential for single nucleotide polymorphism (SNP) detection. 2011 © The Japan Society for Analytical Chemistry

  16. A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

    PubMed Central

    Kress, W. John; Erickson, David L.

    2007-01-01

    Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588

  17. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

    PubMed

    Kress, W John; Erickson, David L

    2007-06-06

    A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.

  18. Elective ambulatory surgical care in Ireland-why it needs to be better coded, classified and managed.

    PubMed

    Keane, Frank; Hammond, Laura; Kelliher, Gerry; Mealy, Ken

    2017-12-12

    In the year to July 2017, surgical disciplines accounted for 73% of the total national inpatient and day case waiting list and, of these, day cases accounted for 72%. Their proper classification is therefore important so that patients can be managed and treated in the most suitable and efficient setting. We set out to sub-classify the different elective surgical day cases treated in Irish public hospitals in order to assess their need to be managed as day cases and the consistency of practice between hospitals. We analysed all elective day cases that came under the care of surgeons between January 2014 and December 2016 and sub-classified them into those that were (A) true day case surgical procedures; (B) minor surgery or outpatient procedures; (C) gastrointestinal endoscopies; (D) day case, non-surgical interventions and (E) unclassified or having no primary procedure identified. Of 813,236 day case surgical interventions performed over 3 years, 26% were adjudged to accord with group A, 41% with B, 23% with C, 5% with D and 5% with E. The ratio of A to B procedures did not vary significantly across the range of hospital types. However, there were some notable variations in coding and practices between hospitals. Our findings show that many day cases should have been performed as outpatient procedures and that there were variations in coding and practices between hospitals that could not be easily explained. Outpatient procedure coding and a better, more consistent, classification of day cases are both required to better manage this group of patients.

  19. Second-generation UMTRI coding scheme for classifying driver tasks in distraction studies and application to the ACAS FOT video clips.

    DOT National Transportation Integrated Search

    2006-07-01

    This report describes the development of a new coding scheme to classify potentially distracting secondary tasks performed while driving, such as eating and using a cell phone. Compared with prior schemes (Stutts et al., first-generation UMTRI scheme...

  20. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

    PubMed

    Kawano, Tomonori

    2013-03-01

    There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.

  1. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system

    PubMed Central

    Kawano, Tomonori

    2013-01-01

    There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303

  2. Junk DNA and the long non-coding RNA twist in cancer genetics

    PubMed Central

    Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

    2015-01-01

    The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839

  3. Coding of DNA samples and data in the pharmaceutical industry: current practices and future directions--perspective of the I-PWG.

    PubMed

    Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A

    2011-04-01

    DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.

  4. Cloning and expression of a cDNA coding for catalase from zebrafish (Danio rerio).

    PubMed

    Ken, C F; Lin, C T; Wu, J L; Shaw, J F

    2000-06-01

    A full-length complementary DNA (cDNA) clone encoding a catalase was amplified by the rapid amplication of cDNA ends-polymerase chain reaction (RACE-PCR) technique from zebrafish (Danio rerio) mRNA. Nucleotide sequence analysis of this cDNA clone revealed that it comprised a complete open reading frame coding for 526 amino acid residues and that it had a molecular mass of 59 654 Da. The deduced amino acid sequence showed high similarity with the sequences of catalase from swine (86.9%), mouse (85.8%), rat (85%), human (83.7%), fruit fly (75.6%), nematode (71.1%), and yeast (58.6%). The amino acid residues for secondary structures are apparently conserved as they are present in other mammal species. Furthermore, the coding region of zebrafish catalase was introduced into an expression vector, pET-20b(+), and transformed into Escherichia coli expression host BL21(DE3)pLysS. A 60-kDa active catalase protein was expressed and detected by Coomassie blue staining as well as activity staining on polyacrylamide gel followed electrophoresis.

  5. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

    PubMed Central

    Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

    2015-01-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID

  6. Minimal methylation classifier (MIMIC): A novel method for derivation and rapid diagnostic detection of disease-associated DNA methylation signatures.

    PubMed

    Schwalbe, E C; Hicks, D; Rafiee, G; Bashton, M; Gohlke, H; Enshaei, A; Potluri, S; Matthiesen, J; Mather, M; Taleongpong, P; Chaston, R; Silmon, A; Curtis, A; Lindsey, J C; Crosier, S; Smith, A J; Goschzik, T; Doz, F; Rutkowski, S; Lannering, B; Pietsch, T; Bailey, S; Williamson, D; Clifford, S C

    2017-10-18

    Rapid and reliable detection of disease-associated DNA methylation patterns has major potential to advance molecular diagnostics and underpin research investigations. We describe the development and validation of minimal methylation classifier (MIMIC), combining CpG signature design from genome-wide datasets, multiplex-PCR and detection by single-base extension and MALDI-TOF mass spectrometry, in a novel method to assess multi-locus DNA methylation profiles within routine clinically-applicable assays. We illustrate the application of MIMIC to successfully identify the methylation-dependent diagnostic molecular subgroups of medulloblastoma (the most common malignant childhood brain tumour), using scant/low-quality samples remaining from the most recently completed pan-European medulloblastoma clinical trial, refractory to analysis by conventional genome-wide DNA methylation analysis. Using this approach, we identify critical DNA methylation patterns from previously inaccessible cohorts, and reveal novel survival differences between the medulloblastoma disease subgroups with significant potential for clinical exploitation.

  7. Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes

    DTIC Science & Technology

    2007-10-01

    reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07

  8. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

    PubMed

    Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

    1995-05-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  9. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  10. Classifying Facial Actions

    PubMed Central

    Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.

    2010-01-01

    The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284

  11. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  12. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.

    1987-06-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. Inmore » RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.« less

  14. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-02-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.

  15. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed Central

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-01-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578

  16. Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes (Postprint)

    DTIC Science & Technology

    2007-01-01

    reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07

  17. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    PubMed

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could

  18. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues.

    PubMed Central

    Prody, C A; Zevin-Sonkin, D; Gnatt, A; Goldberg, O; Soreq, H

    1987-01-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase (BtChoEase; EC 3.1.1.8) and Torpedo electric organ "true" acetylcholinesterase (AcChoEase; EC 3.1.1.7). Using these probes, we isolated several cDNA clones from lambda gt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A)+ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These findings demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species. Images PMID:3035536

  19. Prediction of plant lncRNA by ensemble machine learning classifiers.

    PubMed

    Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

    2018-05-02

    In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are

  20. Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

    PubMed Central

    Tramontano, A; Macchiato, M F

    1986-01-01

    An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761

  1. Metabolically Generated Stable Isotope-Labeled Deoxynucleoside Code for Tracing DNA N6-Methyladenine in Human Cells.

    PubMed

    Liu, Baodong; Liu, Xiaoling; Lai, Weiyi; Wang, Hailin

    2017-06-06

    DNA N 6 -methyl-2'-deoxyadenosine (6mdA) is an epigenetic modification in both eukaryotes and bacteria. Here we exploited stable isotope-labeled deoxynucleoside [ 15 N 5 ]-2'-deoxyadenosine ([ 15 N 5 ]-dA) as an initiation tracer and for the first time developed a metabolically differential tracing code for monitoring DNA 6mdA in human cells. We demonstrate that the initiation tracer [ 15 N 5 ]-dA undergoes a specific and efficient adenine deamination reaction leading to the loss the exocyclic amine 15 N, and further utilizes the purine salvage pathway to generate mainly both [ 15 N 4 ]-dA and [ 15 N 4 ]-2'-deoxyguanosine ([ 15 N 4 ]-dG) in mammalian genomes. However, [ 15 N 5 ]-dA is largely retained in the genomes of mycoplasmas, which are often found in cultured cells and experimental animals. Consequently, the methylation of dA generates 6mdA with a consistent coding pattern, with a predominance of [ 15 N 4 ]-6mdA. Therefore, mammalian DNA 6mdA can be potentially discriminated from that generated by infecting mycoplasmas. Collectively, we show a promising approach for identification of authentic DNA 6mdA in human cells and determine if the human cells are contaminated with mycoplasmas.

  2. Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles.

    PubMed

    Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin

    2009-01-01

    Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

  3. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    PubMed

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  4. DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.

    PubMed

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.

  5. DNA Fingerprinting of Chinese Melon Provides Evidentiary Support of Seed Quality Appraisal

    PubMed Central

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications. PMID:23285039

  6. DNA strand breaks induced by electrons simulated with Nanodosimetry Monte Carlo Simulation Code: NASIC.

    PubMed

    Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong

    2015-09-01

    The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

    PubMed

    Hua, Wei; Wang, Jiasong; Zhao, Jian

    2014-01-01

    Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. DNA Barcoding through Quaternary LDPC Codes

    PubMed Central

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348

  9. DNA Barcoding through Quaternary LDPC Codes.

    PubMed

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).

  10. DNA-based watermarks using the DNA-Crypt algorithm.

    PubMed

    Heider, Dominik; Barnekow, Angelika

    2007-05-29

    The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

  11. DNA-based watermarks using the DNA-Crypt algorithm

    PubMed Central

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  12. Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

    PubMed

    Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

    2015-05-01

    Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. The changing epitome of species identification – DNA barcoding

    PubMed Central

    Ajmal Ali, M.; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M.A.; Pandey, Arun K.; Lee, Joongku

    2014-01-01

    The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The ‘DNA barcodes’ show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007

  14. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  15. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis.

    PubMed

    Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin

    2006-11-01

    The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.

  16. Quantitative Profiling of Peptides from RNAs classified as non-coding

    PubMed Central

    Prabakaran, Sudhakaran; Hemberg, Martin; Chauhan, Ruchi; Winter, Dominic; Tweedie-Cullen, Ry Y.; Dittrich, Christian; Hong, Elizabeth; Gunawardena, Jeremy; Steen, Hanno; Kreiman, Gabriel; Steen, Judith A.

    2014-01-01

    Only a small fraction of the mammalian genome codes for messenger RNAs destined to be translated into proteins, and it is generally assumed that a large portion of transcribed sequences - including introns and several classes of non-coding RNAs (ncRNAs) do not give rise to peptide products. A systematic examination of translation and physiological regulation of ncRNAs has not been conducted. Here, we use computational methods to identify the products of non-canonical translation in mouse neurons by analyzing unannotated transcripts in combination with proteomic data. This study supports the existence of non-canonical translation products from both intragenic and extragenic genomic regions, including peptides derived from anti-sense transcripts and introns. Moreover, the studied novel translation products exhibit temporal regulation similar to that of proteins known to be involved in neuronal activity processes. These observations highlight a potentially large and complex set of biologically regulated translational events from transcripts formerly thought to lack coding potential. PMID:25403355

  17. URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit

    NASA Astrophysics Data System (ADS)

    Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe

    1986-10-01

    The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.

  18. Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes

    PubMed Central

    Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi

    2004-01-01

    Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely

  19. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  20. A Framework for Identifying and Classifying Undergraduate Student Proof Errors

    ERIC Educational Resources Information Center

    Strickland, S.; Rand, B.

    2016-01-01

    This paper describes a framework for identifying, classifying, and coding student proofs, modified from existing proof-grading rubrics. The framework includes 20 common errors, as well as categories for interpreting the severity of the error. The coding scheme is intended for use in a classroom context, for providing effective student feedback. In…

  1. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  2. Benzofurazane as a new redox label for electrochemical detection of DNA: towards multipotential redox coding of DNA bases.

    PubMed

    Balintová, Jana; Plucnara, Medard; Vidláková, Pavlína; Pohl, Radek; Havran, Luděk; Fojta, Miroslav; Hocek, Michal

    2013-09-16

    Benzofurazane has been attached to nucleosides and dNTPs, either directly or through an acetylene linker, as a new redox label for electrochemical analysis of nucleotide sequences. Primer extension incorporation of the benzofurazane-modified dNTPs by polymerases has been developed for the construction of labeled oligonucleotide probes. In combination with nitrophenyl and aminophenyl labels, we have successfully developed a three-potential coding of DNA bases and have explored the relevant electrochemical potentials. The combination of benzofurazane and nitrophenyl reducible labels has proved to be excellent for ratiometric analysis of nucleotide sequences and is suitable for bioanalytical applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

    PubMed Central

    LaPolla, R J; Mayne, K M; Davidson, N

    1984-01-01

    A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870

  4. Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

    PubMed Central

    Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest

    2006-01-01

    Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841

  5. Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.

    PubMed

    Herrnstadt, Corinna; Elson, Joanna L; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M; Anderson, Christen; Ghosh, Soumitra S; Olefsky, Jerrold M; Beal, M Flint; Davis, Robert E; Howell, Neil

    2002-05-01

    The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here.

  6. Embedded feature ranking for ensemble MLP classifiers.

    PubMed

    Windeatt, Terry; Duangsoithong, Rakkrit; Smith, Raymond

    2011-06-01

    A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.

  7. Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease

    PubMed Central

    Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao

    2018-01-01

    Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510

  8. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  9. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  10. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  11. An algebraic hypothesis about the primeval genetic code architecture.

    PubMed

    Sánchez, Robersy; Grau, Ricardo

    2009-09-01

    A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.

  12. On the statistical assessment of classifiers using DNA microarray data

    PubMed Central

    Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F

    2006-01-01

    Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set

  13. Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease.

    PubMed

    Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia

    2018-01-04

    Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. In vivo induction of interferon gamma expression in grey horses with metastatic melanoma resulting from direct injection of plasmid DNA coding for equine interleukin 12.

    PubMed

    Müller, J-M V; Wissemann, J; Meli, M L; Dasen, G; Lutz, H; Heinzerling, L; Feige, K

    2011-11-01

    Whole blood pharmacokinetics of intratumourally injected naked plasmid DNA coding for equine Interleukin 12 (IL-12) was assessed as a means of in vivo gene transfer in the treatment of melanoma in grey horses. The expression of induced interferon gamma (IFN-g) was evaluated in order to determine the pharmacodynamic properties of in vivo gene transduction. Seven grey horses bearing melanoma were injected intratumourally with 250 µg naked plasmid DNA coding for IL-12. Peripheral blood and biopsies from the injection site were taken at 13 time points until day 14 post injection (p.i.). Samples were analysed using quantitative real-time PCR. Plasmid DNA was quantified in blood samples and mRNA expression for IFN-g in tissue samples. Plasmid DNA showed fast elimination kinetics with more than 99 % of the plasmid disappearing within 36 hours. IFN-g expression increased quickly after IL-12 plasmid injection, but varied between individual horses. Intratumoural injection of plasmid DNA is a feasible method for inducing transgene expression in vivo. Biological activity of the transgene IL-12 was confirmed by measuring expression of IFN-g.

  15. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    NASA Astrophysics Data System (ADS)

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  16. Phylogenetic Network for European mtDNA

    PubMed Central

    Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari

    2001-01-01

    The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229

  17. Complex interplay among DNA modification, noncoding RNA expression and protein-coding RNA expression in Salvia miltiorrhiza chloroplast genome.

    PubMed

    Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang

    2014-01-01

    Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.

  18. An open reading frame in intron seven of the sea urchin DNA-methyltransferase gene codes for a functional AP1 endonuclease.

    PubMed

    Cioffi, Anna Valentina; Ferrara, Diana; Cubellis, Maria Vittoria; Aniello, Francesco; Corrado, Marcella; Liguori, Francesca; Amoroso, Alessandro; Fucci, Laura; Branno, Margherita

    2002-08-01

    Analysis of the genome structure of the Paracentrotus lividus (sea urchin) DNA methyltransferase (DNA MTase) gene showed the presence of an open reading frame, named METEX, in intron 7 of the gene. METEX expression is developmentally regulated, showing no correlation with DNA MTase expression. In fact, DNA MTase transcripts are present at high concentrations in the early developmental stages, while METEX is expressed at late stages of development. Two METEX cDNA clones (Met1 and Met2) that are different in the 3' end have been isolated in a cDNA library screening. The putative translated protein from Met2 cDNA clone showed similarity with Escherichia coli endonuclease III on the basis of sequence and predictive three-dimensional structure. The protein, overexpressed in E. coli and purified, had functional properties similar to the endonuclease specific for apurinic/apyrimidinic (AP) sites on the basis of the lyase activity. Therefore the open reading frame, present in intron 7 of the P. lividus DNA MTase gene, codes for a functional AP endonuclease designated SuAP1.

  19. An overview of the structures of protein-DNA complexes

    PubMed Central

    Luscombe, Nicholas M; Austin, Susan E; Berman , Helen M; Thornton, Janet M

    2000-01-01

    On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes. PMID:11104519

  20. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  1. 3-base periodicity in coding DNA is affected by intercodon dinucleotides

    PubMed Central

    Sánchez, Joaquín

    2011-01-01

    All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where “|” indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed. PMID:21814388

  2. Complex Interplay among DNA Modification, Noncoding RNA Expression and Protein-Coding RNA Expression in Salvia miltiorrhiza Chloroplast Genome

    PubMed Central

    Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang

    2014-01-01

    Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614

  3. Gene and genon concept: coding versus regulation

    PubMed Central

    2007-01-01

    We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various

  4. Formation and oral administration of alginate microspheres loaded with pDNA coding for lymphocystis disease virus (LCDV) to Japanese flounder.

    PubMed

    Tian, Ji-Yuan; Sun, Xiu-Qin; Chen, Xi-Guang

    2008-05-01

    Oral delivery of plasmid DNA (pDNA) is a desirable approach for fish immunization in intensive culture. However, its effectiveness is limited because of possible degradation of pDNA in the fish's digestive system. In this report, alginate microspheres loaded with pDNA coding for fish lymphocystis disease virus (LCDV) and green fluorescent protein were prepared with a modified oil containing water (W/O) emulsification method. Yield, loading percent and encapsulation efficiency of alginate microspheres were 90.5%, 1.8% and 92.7%, respectively. The alginate microspheres had diameters of less than 10 microm, and their shape was spherical. As compared to sodium alginate, a remarkable increase of DNA-phosphodiester and DNA-phosphomonoester bonds was observed for alginate microspheres loaded with pDNA by Fourier transform infrared (FTIR) spectroscopic analysis. Agarose gel electrophoresis showed a little supercoiled pDNA was transformed to open circular and linear pDNA during encapsulation. The cumulative release of pDNA in alginate microspheres was DNA expressed RNA and green fluorescent protein in tissues of fish 10-90 days after oral administration. An indirect enzyme-linked immunosorbent assay (ELISA) showed that sera were positive (OD >or=0.3) for anti-LCDV antibody from week 3 to week 16 for fish orally vaccinated with alginate microspheres loaded with pDNA, in comparison with fish orally vaccinated with naked pDNA. Our results display that alginate microspheres obtained by W/O emulsification are promising carriers for oral delivery of pDNA. This encapsulation technique has the potential for DNA vaccine delivery applications due to its ease of operation, low cost and significant immune effect.

  5. Coarse-grained simulation of DNA using LAMMPS : An implementation of the oxDNA model and its applications.

    PubMed

    Henrich, Oliver; Gutiérrez Fosado, Yair Augusto; Curk, Tine; Ouldridge, Thomas E

    2018-05-10

    During the last decade coarse-grained nucleotide models have emerged that allow us to study DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.

  6. cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

    PubMed Central

    Kumari, Pooja; Sampath, Karuna

    2015-01-01

    For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036

  7. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA

    PubMed Central

    Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867

  8. Intricate and Cell Type-Specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens.

    PubMed

    Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z

    2017-10-05

    Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.

  9. Isolation of a DNA Probe for Lactobacillus curvatus

    PubMed Central

    Petrick, Hendrik A. R.; Ambrosio, Riccardo E.; Holzapfel, Wilhelm H.

    1988-01-01

    A genomic library of Lactobacillus curvatus DSM 20019 was constructed in bacteriophage λ gt11. A 1.2-kilobase DNA probe specific for L. curvatus was isolated from this library. When this probe was hybridized to DNA from Lactobacillus isolates from different sources classified by conventional techniques, differing degrees of hybridization were obtained. This could imply that these isolates may have been incorrectly classified. Images PMID:16347554

  10. HEXIM1 and NEAT1 Long Non-coding RNA Form a Multi-subunit Complex that Regulates DNA-Mediated Innate Immune Response.

    PubMed

    Morchikh, Mehdi; Cribier, Alexandra; Raffel, Raoul; Amraoui, Sonia; Cau, Julien; Severac, Dany; Dubois, Emeric; Schwartz, Olivier; Bennasser, Yamina; Benkirane, Monsef

    2017-08-03

    The DNA-mediated innate immune response underpins anti-microbial defenses and certain autoimmune diseases. Here we used immunoprecipitation, mass spectrometry, and RNA sequencing to identify a ribonuclear complex built around HEXIM1 and the long non-coding RNA NEAT1 that we dubbed the HEXIM1-DNA-PK-paraspeckle components-ribonucleoprotein complex (HDP-RNP). The HDP-RNP contains DNA-PK subunits (DNAPKc, Ku70, and Ku80) and paraspeckle proteins (SFPQ, NONO, PSPC1, RBM14, and MATRIN3). We show that binding of HEXIM1 to NEAT1 is required for its assembly. We further demonstrate that the HDP-RNP is required for the innate immune response to foreign DNA, through the cGAS-STING-IRF3 pathway. The HDP-RNP interacts with cGAS and its partner PQBP1, and their interaction is remodeled by foreign DNA. Remodeling leads to the release of paraspeckle proteins, recruitment of STING, and activation of DNAPKc and IRF3. Our study establishes the HDP-RNP as a key nuclear regulator of DNA-mediated activation of innate immune response through the cGAS-STING pathway. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Evaluation of the efficacy of twelve mitochondrial protein-coding genes as barcodes for mollusk DNA barcoding.

    PubMed

    Yu, Hong; Kong, Lingfeng; Li, Qi

    2016-01-01

    In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.

  12. Statistical approaches to account for false-positive errors in environmental DNA samples.

    PubMed

    Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid

    2016-05-01

    Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies. © 2015 John Wiley & Sons Ltd.

  13. The histone codes for meiosis.

    PubMed

    Wang, Lina; Xu, Zhiliang; Khawar, Muhammad Babar; Liu, Chao; Li, Wei

    2017-09-01

    Meiosis is a specialized process that produces haploid gametes from diploid cells by a single round of DNA replication followed by two successive cell divisions. It contains many special events, such as programmed DNA double-strand break (DSB) formation, homologous recombination, crossover formation and resolution. These events are associated with dynamically regulated chromosomal structures, the dynamic transcriptional regulation and chromatin remodeling are mainly modulated by histone modifications, termed 'histone codes'. The purpose of this review is to summarize the histone codes that are required for meiosis during spermatogenesis and oogenesis, involving meiosis resumption, meiotic asymmetric division and other cellular processes. We not only systematically review the functional roles of histone codes in meiosis but also discuss future trends and perspectives in this field. © 2017 Society for Reproduction and Fertility.

  14. Is a Genome a Codeword of an Error-Correcting Code?

    PubMed Central

    Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

    2012-01-01

    Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495

  15. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  16. Study characterizes long non-coding RNA’s response to DNA damage in colon cancer cells | Center for Cancer Research

    Cancer.gov

    Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer

  17. Comparison of Geant4-DNA simulation of S-values with other Monte Carlo codes

    NASA Astrophysics Data System (ADS)

    André, T.; Morini, F.; Karamitros, M.; Delorme, R.; Le Loirec, C.; Campos, L.; Champion, C.; Groetz, J.-E.; Fromm, M.; Bordage, M.-C.; Perrot, Y.; Barberet, Ph.; Bernal, M. A.; Brown, J. M. C.; Deleuze, M. S.; Francis, Z.; Ivanchenko, V.; Mascialino, B.; Zacharatou, C.; Bardiès, M.; Incerti, S.

    2014-01-01

    Monte Carlo simulations of S-values have been carried out with the Geant4-DNA extension of the Geant4 toolkit. The S-values have been simulated for monoenergetic electrons with energies ranging from 0.1 keV up to 20 keV, in liquid water spheres (for four radii, chosen between 10 nm and 1 μm), and for electrons emitted by five isotopes of iodine (131, 132, 133, 134 and 135), in liquid water spheres of varying radius (from 15 μm up to 250 μm). The results have been compared to those obtained from other Monte Carlo codes and from other published data. The use of the Kolmogorov-Smirnov test has allowed confirming the statistical compatibility of all simulation results.

  18. A system for classifying wood-using industries and recording statistics for automatic data processing.

    Treesearch

    E.W. Fobes; R.W. Rowe

    1968-01-01

    A system for classifying wood-using industries and recording pertinent statistics for automatic data processing is described. Forms and coding instructions for recording data of primary processing plants are included.

  19. Local Laplacian Coding From Theoretical Analysis of Local Coding Schemes for Locally Linear Classification.

    PubMed

    Pang, Junbiao; Qin, Lei; Zhang, Chunjie; Zhang, Weigang; Huang, Qingming; Yin, Baocai

    2015-12-01

    Local coordinate coding (LCC) is a framework to approximate a Lipschitz smooth function by combining linear functions into a nonlinear one. For locally linear classification, LCC requires a coding scheme that heavily determines the nonlinear approximation ability, posing two main challenges: 1) the locality making faraway anchors have smaller influences on current data and 2) the flexibility balancing well between the reconstruction of current data and the locality. In this paper, we address the problem from the theoretical analysis of the simplest local coding schemes, i.e., local Gaussian coding and local student coding, and propose local Laplacian coding (LPC) to achieve the locality and the flexibility. We apply LPC into locally linear classifiers to solve diverse classification tasks. The comparable or exceeded performances of state-of-the-art methods demonstrate the effectiveness of the proposed method.

  20. Genes and Pathways Involved in Adult Onset Disorders Featuring Muscle Mitochondrial DNA Instability

    PubMed Central

    Ahmed, Naghia; Ronchi, Dario; Comi, Giacomo Pietro

    2015-01-01

    Replication and maintenance of mtDNA entirely relies on a set of proteins encoded by the nuclear genome, which include members of the core replicative machinery, proteins involved in the homeostasis of mitochondrial dNTPs pools or deputed to the control of mitochondrial dynamics and morphology. Mutations in their coding genes have been observed in familial and sporadic forms of pediatric and adult-onset clinical phenotypes featuring mtDNA instability. The list of defects involved in these disorders has recently expanded, including mutations in the exo-/endo-nuclease flap-processing proteins MGME1 and DNA2, supporting the notion that an enzymatic DNA repair system actively takes place in mitochondria. The results obtained in the last few years acknowledge the contribution of next-generation sequencing methods in the identification of new disease loci in small groups of patients and even single probands. Although heterogeneous, these genes can be conveniently classified according to the pathway to which they belong. The definition of the molecular and biochemical features of these pathways might be helpful for fundamental knowledge of these disorders, to accelerate genetic diagnosis of patients and the development of rational therapies. In this review, we discuss the molecular findings disclosed in adult patients with muscle pathology hallmarked by mtDNA instability. PMID:26251896

  1. Study characterizes long non-coding RNA’s response to DNA damage in colon cancer cells | Center for Cancer Research

    Cancer.gov

    Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer cells.  Read more...

  2. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    PubMed

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). Machine learning

  3. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

    PubMed Central

    2017-01-01

    Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or

  4. Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.

    PubMed

    Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao

    2017-07-01

    In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical

  5. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

    PubMed

    Wright, Imogen A; Travers, Simon A

    2014-07-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Brain cDNA clone for human cholinesterase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McTiernan, C.; Adkins, S.; Chatonnet, A.

    1987-10-01

    A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less

  7. Decoding DNA labels by melting curve analysis using real-time PCR.

    PubMed

    Balog, József A; Fehér, Liliána Z; Puskás, László G

    2017-12-01

    Synthetic DNA has been used as an authentication code for a diverse number of applications. However, existing decoding approaches are based on either DNA sequencing or the determination of DNA length variations. Here, we present a simple alternative protocol for labeling different objects using a small number of short DNA sequences that differ in their melting points. Code amplification and decoding can be done in two steps using quantitative PCR (qPCR). To obtain a DNA barcode with high complexity, we defined 8 template groups, each having 4 different DNA templates, yielding 158 (>2.5 billion) combinations of different individual melting temperature (Tm) values and corresponding ID codes. The reproducibility and specificity of the decoding was confirmed by using the most complex template mixture, which had 32 different products in 8 groups with different Tm values. The industrial applicability of our protocol was also demonstrated by labeling a drone with an oil-based paint containing a predefined DNA code, which was then successfully decoded. The method presented here consists of a simple code system based on a small number of synthetic DNA sequences and a cost-effective, rapid decoding protocol using a few qPCR reactions, enabling a wide range of authentication applications.

  8. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  9. Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions

    PubMed Central

    Harteis, Sabrina; Schneider, Sabine

    2014-01-01

    DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169

  10. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    NASA Astrophysics Data System (ADS)

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  11. Scaling features of noncoding DNA

    NASA Technical Reports Server (NTRS)

    Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

    1999-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.

  12. Computational fishing of new DNA methyltransferase inhibitors from natural products.

    PubMed

    Maldonado-Rojas, Wilson; Olivero-Verbel, Jesus; Marrero-Ponce, Yovani

    2015-07-01

    DNA methyltransferase inhibitors (DNMTis) have become an alternative for cancer therapies. However, only two DNMTis have been approved as anticancer drugs, although with some restrictions. Natural products (NPs) are a promising source of drugs. In order to find NPs with novel chemotypes as DNMTis, 47 compounds with known activity against these enzymes were used to build a LDA-based QSAR model for active/inactive molecules (93% accuracy) based on molecular descriptors. This classifier was employed to identify potential DNMTis on 800 NPs from NatProd Collection. 447 selected compounds were docked on two human DNA methyltransferase (DNMT) structures (PDB codes: 3SWR and 2QRV) using AutoDock Vina and Surflex-Dock, prioritizing according to their score values, contact patterns at 4 Å and molecular diversity. Six consensus NPs were identified as virtual hits against DNMTs, including 9,10-dihydro-12-hydroxygambogic, phloridzin, 2',4'-dihydroxychalcone 4'-glucoside, daunorubicin, pyrromycin and centaurein. This method is an innovative computational strategy for identifying DNMTis, useful in the identification of potent and selective anticancer drugs. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. MScanner: a classifier for retrieving Medline citations

    PubMed Central

    Poulter, Graham L; Rubin, Daniel L; Altman, Russ B; Seoighe, Cathal

    2008-01-01

    retrieving topics for which many features may indicate relevance. Its web interface simplifies the task of classifying Medline citations, compared to building a pre-filter and classifier specific to the topic. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material, and the web interface may be accessed at . PMID:18284683

  14. DNA as information.

    PubMed

    Wills, Peter R

    2016-03-13

    This article reviews contributions to this theme issue covering the topic 'DNA as information' in relation to the structure of DNA, the measure of its information content, the role and meaning of information in biology and the origin of genetic coding as a transition from uninformed to meaningful computational processes in physical systems. © 2016 The Author(s).

  15. DNA cards: determinants of DNA yield and quality in collecting genetic samples for pharmacogenetic studies.

    PubMed

    Mas, Sergi; Crescenti, Anna; Gassó, Patricia; Vidal-Taboada, Jose M; Lafuente, Amalia

    2007-08-01

    As pharmacogenetic studies frequently require establishment of DNA banks containing large cohorts with multi-centric designs, inexpensive methods for collecting and storing high-quality DNA are needed. The aims of this study were two-fold: to compare the amount and quality of DNA obtained from two different DNA cards (IsoCode Cards or FTA Classic Cards, Whatman plc, Brentford, Middlesex, UK); and to evaluate the effects of time and storage temperature, as well as the influence of anticoagulant ethylenediaminetetraacetic acid on the DNA elution procedure. The samples were genotyped by several methods typically used in pharmacogenetic studies: multiplex PCR, PCR-restriction fragment length polymorphism, single nucleotide primer extension, and allelic discrimination assay. In addition, they were amplified by whole genome amplification to increase genomic DNA mass. Time, storage temperature and ethylenediaminetetraacetic acid had no significant effects on either DNA card. This study reveals the importance of drying blood spots prior to isolation to avoid haemoglobin interference. Moreover, our results demonstrate that re-isolation protocols could be applied to increase the amount of DNA recovered. The samples analysed were accurately genotyped with all the methods examined herein. In conclusion, our study shows that both DNA cards, IsoCode Cards and FTA Classic Cards, facilitate genetic and pharmacogenetic testing for routine clinical practice.

  16. The distribution of DNA damage is defined by region-specific susceptibility to DNA damage formation rather than repair differences.

    PubMed

    Strand, Janne M; Scheffler, Katja; Bjørås, Magnar; Eide, Lars

    2014-06-01

    The cellular genomes are continuously damaged by reactive oxygen species (ROS) from aerobic processes. The impact of DNA damage depends on the specific site as well as the cellular state. The steady-state level of DNA damage is the net result of continuous formation and subsequent repair, but it is unknown to what extent heterogeneous damage distribution is caused by variations in formation or repair of DNA damage. Here, we used a restriction enzyme/qPCR based method to analyze DNA damage in promoter and coding regions of four nuclear genes: the two house-keeping genes Gadph and Tbp, and the Ndufa9 and Ndufs2 genes encoding mitochondrial complex I subunits, as well as mt-Rnr1 encoded by mitochondrial DNA (mtDNA). The distribution of steady-state levels of damage varied in a site-specific manner. Oxidative stress induced damage in nDNA to a similar extent in promoter and coding regions, and more so in mtDNA. The subsequent removal of damage from nDNA was efficient and comparable with recovery times depending on the initial damage load, while repair of mtDNA was delayed with subsequently slower repair rate. The repair was furthermore found to be independent of transcription or the transcription-coupled repair factor CSB, but dependent on cellular ATP. Our results demonstrate that the capacity to repair DNA is sufficient to remove exogenously induced damage. Thus, we conclude that the heterogeneous steady-state level of DNA damage in promoters and coding regions is caused by site-specific DNA damage/modifications that take place under normal metabolism. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Context influences on TALE–DNA binding revealed by quantitative profiling

    PubMed Central

    Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.

    2015-01-01

    Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805

  18. Context influences on TALE-DNA binding revealed by quantitative profiling.

    PubMed

    Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L

    2015-06-11

    Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.

  19. The PARTRAC code: Status and recent developments

    NASA Astrophysics Data System (ADS)

    Friedland, Werner; Kundrat, Pavel

    Biophysical modeling is of particular value for predictions of radiation effects due to manned space missions. PARTRAC is an established tool for Monte Carlo-based simulations of radiation track structures, damage induction in cellular DNA and its repair [1]. Dedicated modules describe interactions of ionizing particles with the traversed medium, the production and reactions of reactive species, and score DNA damage determined by overlapping track structures with multi-scale chromatin models. The DNA repair module describes the repair of DNA double-strand breaks (DSB) via the non-homologous end-joining pathway; the code explicitly simulates the spatial mobility of individual DNA ends in parallel with their processing by major repair enzymes [2]. To simulate the yields and kinetics of radiation-induced chromosome aberrations, the repair module has been extended by tracking the information on the chromosome origin of ligated fragments as well as the presence of centromeres [3]. PARTRAC calculations have been benchmarked against experimental data on various biological endpoints induced by photon and ion irradiation. The calculated DNA fragment distributions after photon and ion irradiation reproduce corresponding experimental data and their dose- and LET-dependence. However, in particular for high-LET radiation many short DNA fragments are predicted below the detection limits of the measurements, so that the experiments significantly underestimate DSB yields by high-LET radiation [4]. The DNA repair module correctly describes the LET-dependent repair kinetics after (60) Co gamma-rays and different N-ion radiation qualities [2]. First calculations on the induction of chromosome aberrations have overestimated the absolute yields of dicentrics, but correctly reproduced their relative dose-dependence and the difference between gamma- and alpha particle irradiation [3]. Recent developments of the PARTRAC code include a model of hetero- vs euchromatin structures to enable

  20. Is diabetes mellitus correctly registered and classified in primary care? A population-based study in Catalonia, Spain.

    PubMed

    Mata-Cases, Manel; Mauricio, Dídac; Real, Jordi; Bolíbar, Bonaventura; Franch-Nadal, Josep

    2016-11-01

    To assess the prevalence of miscoding, misclassification, misdiagnosis and under-registration of diabetes mellitus (DM) in primary health care in Catalonia (Spain), and to explore use of automated algorithms to identify them. In this cross-sectional, retrospective study using an anonymized electronic general practice database, data were collected from patients or users with a diabetes-related code or from patients with no DM or prediabetes code but treated with antidiabetic drugs (unregistered DM). Decision algorithms were designed to classify the true diagnosis of type 1 DM (T1DM), type 2 DM (T2DM), and undetermined DM (UDM), and to classify unregistered DM patients treated with antidiabetic drugs. Data were collected from a total of 376,278 subjects with a DM ICD-10 code, and from 8707 patients with no DM or prediabetes code but treated with antidiabetic drugs. After application of the algorithms, 13.9% of patients with T1DM were identified as misclassified, and were probably T2DM; 80.9% of patients with UDM were reclassified as T2DM, and 19.1% of them were misdiagnosed as DM when they probably had prediabetes. The overall prevalence of miscoding (multiple codes or UDM) was 2.2%. Finally, 55.2% of subjects with unregistered DM were classified as prediabetes, 35.7% as T2DM, 8.5% as UDM treated with insulin, and 0.6% as T1DM. The prevalence of inappropriate codification or classification and under-registration of DM is relevant in primary care. Implementation of algorithms could automatically flag cases that need review and would substantially decrease the risk of inappropriate registration or coding. Copyright © 2016 SEEN. Publicado por Elsevier España, S.L.U. All rights reserved.

  1. Promoter classifier: software package for promoter database analysis.

    PubMed

    Gershenzon, Naum I; Ioshikhes, Ilya P

    2005-01-01

    Promoter Classifier is a package of seven stand-alone Windows-based C++ programs allowing the following basic manipulations with a set of promoter sequences: (i) calculation of positional distributions of nucleotides averaged over all promoters of the dataset; (ii) calculation of the averaged occurrence frequencies of the transcription factor binding sites and their combinations; (iii) division of the dataset into subsets of sequences containing or lacking certain promoter elements or combinations; (iv) extraction of the promoter subsets containing or lacking CpG islands around the transcription start site; and (v) calculation of spatial distributions of the promoter DNA stacking energy and bending stiffness. All programs have a user-friendly interface and provide the results in a convenient graphical form. The Promoter Classifier package is an effective tool for various basic manipulations with eukaryotic promoter sequences that usually are necessary for analysis of large promoter datasets. The program Promoter Divider is described in more detail as a representative component of the package.

  2. Facile and High-Throughput Synthesis of Functional Microparticles with Quick Response Codes.

    PubMed

    Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun

    2016-06-01

    Encoded microparticles are high demand in multiplexed assays and labeling. However, the current methods for the synthesis and coding of microparticles either lack robustness and reliability, or possess limited coding capacity. Here, a massive coding of dissociated elements (MiCODE) technology based on innovation of a chemically reactive off-stoichimetry thiol-allyl photocurable polymer and standard lithography to produce a large number of quick response (QR) code microparticles is introduced. The coding process is performed by photobleaching the QR code patterns on microparticles when fluorophores are incorporated into the prepolymer formulation. The fabricated encoded microparticles can be released from a substrate without changing their features. Excess thiol functionality on the microparticle surface allows for grafting of amine groups and further DNA probes. A multiplexed assay is demonstrated using the DNA-grafted QR code microparticles. The MiCODE technology is further characterized by showing the incorporation of BODIPY-maleimide (BDP-M) and Nile Red fluorophores for coding and the use of microcontact printing for immobilizing DNA probes on microparticle surfaces. This versatile technology leverages mature lithography facilities for fabrication and thus is amenable to scale-up in the future, with potential applications in bioassays and in labeling consumer products. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  4. BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

    PubMed

    Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

    2018-06-05

    With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.

  5. Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features

    PubMed Central

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-01-01

    Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205

  6. DNA Dynamics.

    ERIC Educational Resources Information Center

    Warren, Michael D.

    1997-01-01

    Explains a method to enable students to understand DNA and protein synthesis using model-building and role-playing. Acquaints students with the triplet code and transcription. Includes copies of the charts used in this technique. (DDR)

  7. Classifying clinical notes with pain assessment using machine learning.

    PubMed

    Fodeh, Samah Jamal; Finch, Dezon; Bouayad, Lina; Luther, Stephen L; Ling, Han; Kerns, Robert D; Brandt, Cynthia

    2017-12-26

    Pain is a significant public health problem, affecting millions of people in the USA. Evidence has highlighted that patients with chronic pain often suffer from deficits in pain care quality (PCQ) including pain assessment, treatment, and reassessment. Currently, there is no intelligent and reliable approach to identify PCQ indicators inelectronic health records (EHR). Hereby, we used unstructured text narratives in the EHR to derive pain assessment in clinical notes for patients with chronic pain. Our dataset includes patients with documented pain intensity rating ratings > = 4 and initial musculoskeletal diagnoses (MSD) captured by (ICD-9-CM codes) in fiscal year 2011 and a minimal 1 year of follow-up (follow-up period is 3-yr maximum); with complete data on key demographic variables. A total of 92 patients with 1058 notes was used. First, we manually annotated qualifiers and descriptors of pain assessment using the annotation schema that we previously developed. Second, we developed a reliable classifier for indicators of pain assessment in clinical note. Based on our annotation schema, we found variations in documenting the subclasses of pain assessment. In positive notes, providers mostly documented assessment of pain site (67%) and intensity of pain (57%), followed by persistence (32%). In only 27% of positive notes, did providers document a presumed etiology for the pain complaint or diagnosis. Documentation of patients' reports of factors that aggravate pain was only present in 11% of positive notes. Random forest classifier achieved the best performance labeling clinical notes with pain assessment information, compared to other classifiers; 94, 95, 94, and 94% was observed in terms of accuracy, PPV, F1-score, and AUC, respectively. Despite the wide spectrum of research that utilizes machine learning in many clinical applications, none explored using these methods for pain assessment research. In addition, previous studies using large datasets to

  8. A World Health Organization field trial assessing a proposed ICD-11 framework for classifying patient safety events.

    PubMed

    Forster, Alan J; Bernard, Burnand; Drösler, Saskia E; Gurevich, Yana; Harrison, James; Januel, Jean-Marie; Romano, Patrick S; Southern, Danielle A; Sundararajan, Vijaya; Quan, Hude; Vanderloo, Saskia E; Pincus, Harold A; Ghali, William A

    2017-08-01

    To assess the utility of the proposed World Health Organization (WHO)'s International Classification of Disease (ICD) framework for classifying patient safety events. Independent classification of 45 clinical vignettes using a web-based platform. The WHO's multi-disciplinary Quality and Safety Topic Advisory Group. The framework consists of three concepts: harm, cause and mode. We defined a concept as 'classifiable' if more than half of the raters could assign an ICD-11 code for the case. We evaluated reasons why cases were nonclassifiable using a qualitative approach. Harm was classifiable in 31 of 45 cases (69%). Of these, only 20 could be classified according to cause and mode. Classifiable cases were those in which a clear cause and effect relationship existed (e.g. medication administration error). Nonclassifiable cases were those without clear causal attribution (e.g. pressure ulcer). Of the 14 cases in which harm was not evident (31%), only 5 could be classified according to cause and mode and represented potential adverse events. Overall, nine cases (20%) were nonclassifiable using the three-part patient safety framework and contained significant ambiguity in the relationship between healthcare outcome and putative cause. The proposed framework enabled classification of the majority of patient safety events. Cases in which potentially harmful events did not cause harm were not classifiable; additional code categories within the ICD-11 are one proposal to address this concern. Cases with ambiguity in cause and effect relationship between healthcare processes and outcomes remain difficult to classify. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  9. Investigation of a Sybr-Green-Based Method to Validate DNA Sequences for DNA Computing

    DTIC Science & Technology

    2005-05-01

    OF A SYBR-GREEN-BASED METHOD TO VALIDATE DNA SEQUENCES FOR DNA COMPUTING 6. AUTHOR(S) Wendy Pogozelski, Salvatore Priore, Matthew Bernard ...simulated annealing. Biochemistry, 35, 14077-14089. 15 Pogozelski, W.K., Bernard , M.P. and Macula, A. (2004) DNA code validation using...and Clark, B.F.C. (eds) In RNA Biochemistry and Biotechnology, NATO ASI Series, Kluwer Academic Publishers. Zucker, M. and Stiegler , P. (1981

  10. Using Abbreviated Injury Scale (AIS) codes to classify Computed Tomography (CT) features in the Marshall System.

    PubMed

    Lesko, Mehdi M; Woodford, Maralyn; White, Laura; O'Brien, Sarah J; Childs, Charmaine; Lecky, Fiona E

    2010-08-06

    The purpose of Abbreviated Injury Scale (AIS) is to code various types of Traumatic Brain Injuries (TBI) based on their anatomical location and severity. The Marshall CT Classification is used to identify those subgroups of brain injured patients at higher risk of deterioration or mortality. The purpose of this study is to determine whether and how AIS coding can be translated to the Marshall Classification Initially, a Marshall Class was allocated to each AIS code through cross-tabulation. This was agreed upon through several discussion meetings with experts from both fields (clinicians and AIS coders). Furthermore, in order to make this translation possible, some necessary assumptions with regards to coding and classification of mass lesions and brain swelling were essential which were all approved and made explicit. The proposed method involves two stages: firstly to determine all possible Marshall Classes which a given patient can attract based on allocated AIS codes; via cross-tabulation and secondly to assign one Marshall Class to each patient through an algorithm. This method can be easily programmed in computer softwares and it would enable future important TBI research programs using trauma registry data.

  11. Using Abbreviated Injury Scale (AIS) codes to classify Computed Tomography (CT) features in the Marshall System

    PubMed Central

    2010-01-01

    Background The purpose of Abbreviated Injury Scale (AIS) is to code various types of Traumatic Brain Injuries (TBI) based on their anatomical location and severity. The Marshall CT Classification is used to identify those subgroups of brain injured patients at higher risk of deterioration or mortality. The purpose of this study is to determine whether and how AIS coding can be translated to the Marshall Classification Methods Initially, a Marshall Class was allocated to each AIS code through cross-tabulation. This was agreed upon through several discussion meetings with experts from both fields (clinicians and AIS coders). Furthermore, in order to make this translation possible, some necessary assumptions with regards to coding and classification of mass lesions and brain swelling were essential which were all approved and made explicit. Results The proposed method involves two stages: firstly to determine all possible Marshall Classes which a given patient can attract based on allocated AIS codes; via cross-tabulation and secondly to assign one Marshall Class to each patient through an algorithm. Conclusion This method can be easily programmed in computer softwares and it would enable future important TBI research programs using trauma registry data. PMID:20691038

  12. Is “Junk” DNA Mostly Intron DNA?

    PubMed Central

    Wong, Gane Ka-Shu; Passey, Douglas A.; Huang, Ying-zong; Yang, Zhiyong; Yu, Jun

    2000-01-01

    Among higher eukaryotes, very little of the genome codes for protein. What is in the rest of the genome, or the “junk” DNA, that, in Homo sapiens, is estimated to be almost 97% of the genome? Is it possible that much of this “junk” is intron DNA? This is not a question that can be answered just by looking at the published data, even from the finished genomes. One cannot assume that there are no genes in a sequenced region, just because no genes were annotated. We introduce another approach to this problem, based on an analysis of the cDNA-to-genomic alignments, in all of the complete or nearly-complete genomes from the multicellular organisms. Our conclusion is that, in animals but not in plants, most of the “junk” is intron DNA. PMID:11076852

  13. Intelligent query by humming system based on score level fusion of multiple classifiers

    NASA Astrophysics Data System (ADS)

    Pyo Nam, Gi; Thu Trang Luong, Thi; Ha Nam, Hyun; Ryoung Park, Kang; Park, Sung-Joo

    2011-12-01

    Recently, the necessity for content-based music retrieval that can return results even if a user does not know information such as the title or singer has increased. Query-by-humming (QBH) systems have been introduced to address this need, as they allow the user to simply hum snatches of the tune to find the right song. Even though there have been many studies on QBH, few have combined multiple classifiers based on various fusion methods. Here we propose a new QBH system based on the score level fusion of multiple classifiers. This research is novel in the following three respects: three local classifiers [quantized binary (QB) code-based linear scaling (LS), pitch-based dynamic time warping (DTW), and LS] are employed; local maximum and minimum point-based LS and pitch distribution feature-based LS are used as global classifiers; and the combination of local and global classifiers based on the score level fusion by the PRODUCT rule is used to achieve enhanced matching accuracy. Experimental results with the 2006 MIREX QBSH and 2009 MIR-QBSH corpus databases show that the performance of the proposed method is better than that of single classifier and other fusion methods.

  14. New t-gap insertion-deletion-like metrics for DNA hybridization thermodynamic modeling.

    PubMed

    D'yachkov, Arkadii G; Macula, Anthony J; Pogozelski, Wendy K; Renz, Thomas E; Rykov, Vyacheslav V; Torney, David C

    2006-05-01

    We discuss the concept of t-gap block isomorphic subsequences and use it to describe new abstract string metrics that are similar to the Levenshtein insertion-deletion metric. Some of the metrics that we define can be used to model a thermodynamic distance function on single-stranded DNA sequences. Our model captures a key aspect of the nearest neighbor thermodynamic model for hybridized DNA duplexes. One version of our metric gives the maximum number of stacked pairs of hydrogen bonded nucleotide base pairs that can be present in any secondary structure in a hybridized DNA duplex without pseudoknots. Thermodynamic distance functions are important components in the construction of DNA codes, and DNA codes are important components in biomolecular computing, nanotechnology, and other biotechnical applications that employ DNA hybridization assays. We show how our new distances can be calculated by using a dynamic programming method, and we derive a Varshamov-Gilbert-like lower bound on the size of some of codes using these distance functions as constraints. We also discuss software implementation of our DNA code design methods.

  15. MD simulations of papillomavirus DNA-E2 protein complexes hints at a protein structural code for DNA deformation.

    PubMed

    Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A

    2008-08-01

    The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.

  16. Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet.

    PubMed

    Guo, Haihong; Na, Xu; Hou, Li; Li, Jiao

    2017-06-20

    In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging. This study aimed to classify health care-related questions posted by the general public (Chinese speakers) on the Internet. A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology. The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F 1 score of 90.24%. In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users' information needs on health care. ©Haihong Guo, Xu Na, Li Hou, Jiao Li. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.06.2017.

  17. Deciphering the Epigenetic Code: An Overview of DNA Methylation Analysis Methods

    PubMed Central

    Umer, Muhammad

    2013-01-01

    Abstract Significance: Methylation of cytosine in DNA is linked with gene regulation, and this has profound implications in development, normal biology, and disease conditions in many eukaryotic organisms. A wide range of methods and approaches exist for its identification, quantification, and mapping within the genome. While the earliest approaches were nonspecific and were at best useful for quantification of total methylated cytosines in the chunk of DNA, this field has seen considerable progress and development over the past decades. Recent Advances: Methods for DNA methylation analysis differ in their coverage and sensitivity, and the method of choice depends on the intended application and desired level of information. Potential results include global methyl cytosine content, degree of methylation at specific loci, or genome-wide methylation maps. Introduction of more advanced approaches to DNA methylation analysis, such as microarray platforms and massively parallel sequencing, has brought us closer to unveiling the whole methylome. Critical Issues: Sensitive quantification of DNA methylation from degraded and minute quantities of DNA and high-throughput DNA methylation mapping of single cells still remain a challenge. Future Directions: Developments in DNA sequencing technologies as well as the methods for identification and mapping of 5-hydroxymethylcytosine are expected to augment our current understanding of epigenomics. Here we present an overview of methodologies available for DNA methylation analysis with special focus on recent developments in genome-wide and high-throughput methods. While the application focus relates to cancer research, the methods are equally relevant to broader issues of epigenetics and redox science in this special forum. Antioxid. Redox Signal. 18, 1972–1986. PMID:23121567

  18. Transcription and DNA Damage: Holding Hands or Crossing Swords?

    PubMed

    D'Alessandro, Giuseppina; d'Adda di Fagagna, Fabrizio

    2017-10-27

    Transcription has classically been considered a potential threat to genome integrity. Collision between transcription and DNA replication machinery, and retention of DNA:RNA hybrids, may result in genome instability. On the other hand, it has been proposed that active genes repair faster and preferentially via homologous recombination. Moreover, while canonical transcription is inhibited in the proximity of DNA double-strand breaks, a growing body of evidence supports active non-canonical transcription at DNA damage sites. Small non-coding RNAs accumulate at DNA double-strand break sites in mammals and other organisms, and are involved in DNA damage signaling and repair. Furthermore, RNA binding proteins are recruited to DNA damage sites and participate in the DNA damage response. Here, we discuss the impact of transcription on genome stability, the role of RNA binding proteins at DNA damage sites, and the function of small non-coding RNAs generated upon damage in the signaling and repair of DNA lesions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Cloning and expression of cDNA coding for bouganin.

    PubMed

    den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo

    2002-03-01

    Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.

  20. Statistical and linguistic features of DNA sequences

    NASA Technical Reports Server (NTRS)

    Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.

  1. Permutation coding technique for image recognition systems.

    PubMed

    Kussul, Ernst M; Baidyk, Tatiana N; Wunsch, Donald C; Makeyev, Oleksandr; Martín, Anabel

    2006-11-01

    A feature extractor and neural classifier for image recognition systems are proposed. The proposed feature extractor is based on the concept of random local descriptors (RLDs). It is followed by the encoder that is based on the permutation coding technique that allows to take into account not only detected features but also the position of each feature on the image and to make the recognition process invariant to small displacements. The combination of RLDs and permutation coding permits us to obtain a sufficiently general description of the image to be recognized. The code generated by the encoder is used as an input data for the neural classifier. Different types of images were used to test the proposed image recognition system. It was tested in the handwritten digit recognition problem, the face recognition problem, and the microobject shape recognition problem. The results of testing are very promising. The error rate for the Modified National Institute of Standards and Technology (MNIST) database is 0.44% and for the Olivetti Research Laboratory (ORL) database it is 0.1%.

  2. Measuring diagnoses: ICD code accuracy.

    PubMed

    O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M

    2005-10-01

    To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Main error sources along the "patient trajectory" include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the "paper trail" include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways.

  3. Highly sensitive and selective microRNA detection based on DNA-bio-bar-code and enzyme-assisted strand cycle exponential signal amplification.

    PubMed

    Dong, Haifeng; Meng, Xiangdan; Dai, Wenhao; Cao, Yu; Lu, Huiting; Zhou, Shufeng; Zhang, Xueji

    2015-04-21

    Herein, a highly sensitive and selective microRNA (miRNA) detection strategy using DNA-bio-bar-code amplification (BCA) and Nb·BbvCI nicking enzyme-assisted strand cycle for exponential signal amplification was designed. The DNA-BCA system contains a locked nucleic acid (LNA) modified DNA probe for improving hybridization efficiency, while a signal reported molecular beacon (MB) with an endonuclease recognition site was designed for strand cycle amplification. In the presence of target miRNA, the oligonucleotides functionalized magnetic nanoprobe (MNP-DNA) and gold nanoprobe (AuNP-DNA) with numerous reported probes (RP) can hybridize with target miRNA, respectively, to form a sandwich structure. After sandwich structures were separated from the solution by the magnetic field, the RP were released under high temperature to recognize the MB and cleaved the hairpin DNA to induce the dissociation of RP. The dissociated RP then triggered the next strand cycle to produce exponential fluorescent signal amplification for miRNA detection. Under optimized conditions, the exponential signal amplification system shows a good linear range of 6 orders of magnitude (from 0.3 pM to 3 aM) with limit of detection (LOD) down to 52.5 zM, while the sandwich structure renders the system with high selectivity. Meanwhile, the feasibility of the proposed strategy for cell miRNA detection was confirmed by analyzing miRNA-21 in HeLa lysates. Given the high-performance for miRNA analysis, the strategy has a promising application in biological detection and in clinical diagnosis.

  4. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    PubMed Central

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  5. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    PubMed

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-05-24

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  6. GESPA: classifying nsSNPs to predict disease association.

    PubMed

    Khurana, Jay K; Reeder, Jay E; Shrimpton, Antony E; Thakar, Juilee

    2015-07-25

    Non-synonymous single nucleotide polymorphisms (nsSNPs) are the most common DNA sequence variation associated with disease in humans. Thus determining the clinical significance of each nsSNP is of great importance. Potential detrimental nsSNPs may be identified by genetic association studies or by functional analysis in the laboratory, both of which are expensive and time consuming. Existing computational methods lack accuracy and features to facilitate nsSNP classification for clinical use. We developed the GESPA (GEnomic Single nucleotide Polymorphism Analyzer) program to predict the pathogenicity and disease phenotype of nsSNPs. GESPA is a user-friendly software package for classifying disease association of nsSNPs. It allows flexibility in acceptable input formats and predicts the pathogenicity of a given nsSNP by assessing the conservation of amino acids in orthologs and paralogs and supplementing this information with data from medical literature. The development and testing of GESPA was performed using the humsavar, ClinVar and humvar datasets. Additionally, GESPA also predicts the disease phenotype associated with a nsSNP with high accuracy, a feature unavailable in existing software. GESPA's overall accuracy exceeds existing computational methods for predicting nsSNP pathogenicity. The usability of GESPA is enhanced by fast SQL-based cloud storage and retrieval of data. GESPA is a novel bioinformatics tool to determine the pathogenicity and phenotypes of nsSNPs. We anticipate that GESPA will become a useful clinical framework for predicting the disease association of nsSNPs. The program, executable jar file, source code, GPL 3.0 license, user guide, and test data with instructions are available at http://sourceforge.net/projects/gespa.

  7. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.

    PubMed

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-04-10

    DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.

  8. EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms.

    PubMed

    Rapakoulia, Trisevgeni; Theofilatos, Konstantinos; Kleftogiannis, Dimitrios; Likothanasis, Spiros; Tsakalidis, Athanasios; Mavroudi, Seferina

    2014-08-15

    Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http

  9. A 12-year molecular survey of clinical herpes simplex virus type 2 isolates demonstrates the circulation of clade A and B strains in Germany.

    PubMed

    Schmidt-Chanasit, Jonas; Bialonski, Alexandra; Heinemann, Patrick; Ulrich, Rainer G; Günther, Stephan; Rabenau, Holger F; Doerr, Hans Wilhelm

    2010-07-01

    Recently two different herpes simplex virus type 2 (HSV-2) clades (A and B) were described on DNA sequence data of the glycoprotein E (gE), G (gG) and I (gI) genes. To type the circulating HSV-2 wild-type strains in Germany by a novel approach and to monitor potential changes in the molecular epidemiology between 1997 and 2008. A total of 64 clinical HSV-2 isolates were analyzed by a novel approach using the DNA sequences of the complete open reading frames of glycoprotein B (gB) and gG. Recombination analysis of the gB and gG gene sequences was performed to reveal intragenic recombinants. Based on the phylogenetic analysis of the gB coding DNA sequence 8 of 64 (12%) isolates were classified as clade A strains and 56 of 64 (88%) isolates were classified as clade B strains. Analysis of the gG coding DNA sequence classified 4 (6%) isolates as clade A strains and 60 (94%) isolates as clade B strains. In comparison, the 8 isolates classified as clade A strains using the gB sequence data were classified as clade B strains when using the gG coding DNA sequence, suggesting intergenic recombination events. Intragenic recombination events were not detected. The first molecular survey of clinical HSV-2 isolates from Germany demonstrated the circulation of clade A and B strains and of intergenic recombinants over a period of 12 years. Copyright (c) 2010 Elsevier B.V. All rights reserved.

  10. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

    PubMed Central

    Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  11. A biological inspired fuzzy adaptive window median filter (FAWMF) for enhancing DNA signal processing.

    PubMed

    Ahmad, Muneer; Jung, Low Tan; Bhuiyan, Al-Amin

    2017-10-01

    Digital signal processing techniques commonly employ fixed length window filters to process the signal contents. DNA signals differ in characteristics from common digital signals since they carry nucleotides as contents. The nucleotides own genetic code context and fuzzy behaviors due to their special structure and order in DNA strand. Employing conventional fixed length window filters for DNA signal processing produce spectral leakage and hence results in signal noise. A biological context aware adaptive window filter is required to process the DNA signals. This paper introduces a biological inspired fuzzy adaptive window median filter (FAWMF) which computes the fuzzy membership strength of nucleotides in each slide of window and filters nucleotides based on median filtering with a combination of s-shaped and z-shaped filters. Since coding regions cause 3-base periodicity by an unbalanced nucleotides' distribution producing a relatively high bias for nucleotides' usage, such fundamental characteristic of nucleotides has been exploited in FAWMF to suppress the signal noise. Along with adaptive response of FAWMF, a strong correlation between median nucleotides and the Π shaped filter was observed which produced enhanced discrimination between coding and non-coding regions contrary to fixed length conventional window filters. The proposed FAWMF attains a significant enhancement in coding regions identification i.e. 40% to 125% as compared to other conventional window filters tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. This study proves that conventional fixed length window filters applied to DNA signals do not achieve significant results since the nucleotides carry genetic code context. The proposed FAWMF algorithm is adaptive and outperforms significantly to process DNA signal contents. The algorithm applied to variety of DNA datasets produced noteworthy discrimination between coding and non-coding regions contrary

  12. Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies

    PubMed Central

    Russ, Daniel E.; Ho, Kwan-Yuet; Colt, Joanne S.; Armenti, Karla R.; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P.; Karagas, Margaret R.; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T.; Johnson, Calvin A.; Friesen, Melissa C.

    2016-01-01

    Background Mapping job titles to standardized occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiologic studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. Methods Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14,983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in two occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. Results For 11,991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6- and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (kappa: 0.6–0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. Conclusions Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiologic studies. PMID:27102331

  13. Measuring Diagnoses: ICD Code Accuracy

    PubMed Central

    O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M

    2005-01-01

    Objective To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. Data Sources/Study Setting The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. Study Design/Methods We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Principle Findings Main error sources along the “patient trajectory” include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the “paper trail” include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. Conclusions By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways. PMID:16178999

  14. Artificial Intelligence, DNA Mimicry, and Human Health.

    PubMed

    Stefano, George B; Kream, Richard M

    2017-08-14

    The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.

  15. Process of labeling specific chromosomes using recombinant repetitive DNA

    DOEpatents

    Moyzis, R.K.; Meyne, J.

    1988-02-12

    Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.

  16. enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.

    PubMed

    Xu, Ruifeng; Zhou, Jiyun; Liu, Bin; Yao, Lin; He, Yulan; Zou, Quan; Wang, Xiaolong

    2014-01-01

    DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

  17. Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.

    PubMed

    Russ, Daniel E; Ho, Kwan-Yuet; Colt, Joanne S; Armenti, Karla R; Baris, Dalsu; Chow, Wong-Ho; Davis, Faith; Johnson, Alison; Purdue, Mark P; Karagas, Margaret R; Schwartz, Kendra; Schwenn, Molly; Silverman, Debra T; Johnson, Calvin A; Friesen, Melissa C

    2016-06-01

    Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14 983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. For 11 991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (κ 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  18. Pairwise Classifier Ensemble with Adaptive Sub-Classifiers for fMRI Pattern Analysis.

    PubMed

    Kim, Eunwoo; Park, HyunWook

    2017-02-01

    The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-classifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classifiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.

  19. Improved detection of DNA-binding proteins via compression technology on PSSM information.

    PubMed

    Wang, Yubo; Ding, Yijie; Guo, Fei; Wei, Leyi; Tang, Jijun

    2017-01-01

    Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix-Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix-Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.

  20. Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

    PubMed

    Suyama, Mikita; Lathe, Warren C; Bork, Peer

    2005-10-10

    We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.

  1. Haben repetitive DNA-Sequenzen biologische Funktionen?

    NASA Astrophysics Data System (ADS)

    John, Maliyakal E.; Knöchel, Walter

    1983-05-01

    By DNA reassociation kinetics it is known that the eucaryotic genome consists of non-repetitive DNA, middle-repetitive DNA and highly repetitive DNA. Whereas the majority of protein-coding genes is located on non-repetitive DNA, repetitive DNA forms a constitutive part of eucaryotic DNA and its amount in most cases equals or even substantially exceeds that of non-repetitive DNA. During the past years a large body of data on repetitive DNA has accumulated and these have prompted speculations ranging from specific roles in the regulation of gene expression to that of a selfish entity with inconsequential functions. The following article summarizes recent findings on structural, transcriptional and evolutionary aspects and, although by no means being proven, some possible biological functions are discussed.

  2. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    PubMed Central

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862

  3. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    PubMed

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  4. Emerging criteria for the low-coherence cannot classify category.

    PubMed

    Speranza, Anna Maria; Nicolais, Giampaolo; Maggiora Vergano, Carola; Dazzi, Nino

    2017-12-01

    As suggested by Main et al., to respond to the need for an adaptation of the existing Adult Attachment Interview (AAI) coding system, especially regarding the application to nonnormative samples, this study presents additional criteria that characterize the low-coherence cannot classify (CC) category. Three AAIs were selected from a sample of parents of maltreated children. All transcripts indicated a very low coherence, with no evidence of contradictory insecure discourse strategies. Moreover, global category descriptors were identified, together with specific indices of discourse characteristics and features that highlight the breakdown in reasoning and discourse experienced by the speakers. The aim of the study is to illustrate new criteria to identify and rate a low-coherence CC profile toward the operationalization of this pervasively unintegrated state of mind. Through the definition of additional criteria for low-coherence CC category, our study helps the AAI and its coding system be more flexible and effective when dealing with clinical samples.

  5. Structural diversity of supercoiled DNA

    PubMed Central

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-01-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function. PMID:26455586

  6. Structural diversity of supercoiled DNA

    NASA Astrophysics Data System (ADS)

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-10-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function.

  7. Cancer-linked satellite 2 DNA hypomethylation does not regulate Sat2 non-coding RNA expression and is initiated by heat shock pathway activation.

    PubMed

    Tilman, Gaëlle; Arnoult, Nausica; Lenglez, Sandrine; Van Beneden, Amandine; Loriot, Axelle; De Smet, Charles; Decottignies, Anabelle

    2012-08-01

    Epigenetic dysfunctions, including DNA methylation alterations, play major roles in cancer initiation and progression. Although it is well established that gene promoter demethylation activates transcription, it remains unclear whether hypomethylation of repetitive heterochromatin similarly affects expression of non-coding RNA from these loci. Understanding how repetitive non-coding RNAs are transcriptionally regulated is important given that their established upregulation by the heat shock (HS) pathway suggests important functions in cellular response to stress, possibly by promoting heterochromatin reconstruction. We found that, although pericentromeric satellite 2 (Sat2) DNA hypomethylation is detected in a majority of cancer cell lines of various origins, DNA methylation loss does not constitutively hyperactivate Sat2 expression, and also does not facilitate Sat2 transcriptional induction upon heat shock. In melanoma tumor samples, our analysis revealed that the HS response, frequently upregulated in tumors, is probably the main determinant of Sat2 RNA expression in vivo. Next, we tested whether HS pathway hyperactivation may drive Sat2 demethylation. Strikingly, we found that both hyperthermia and hyperactivated RasV12 oncogene, another potent inducer of the HS pathway, reduced Sat2 methylation levels by up to 27% in human fibroblasts recovering from stress. Demethylation occurred locally on Sat2 repeats, resulting in a demethylation signature that was also detected in cancer cell lines with moderate genome-wide hypomethylation. We therefore propose that upregulation of Sat2 transcription in response to HS pathway hyperactivation during tumorigenesis may promote localized demethylation of the locus. This, in turn, may contribute to tumorigenesis, as demethylation of Sat2 was previously reported to favor chromosomal rearrangements.

  8. Surface shapes and surrounding environment analysis of single- and double-stranded DNA-binding proteins in protein-DNA interface.

    PubMed

    Wang, Wei; Liu, Juan; Sun, Lin

    2016-07-01

    Protein-DNA bindings are critical to many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. Here, we analyzed the residues shape (peak, flat, or valley) and the surrounding environment of double-stranded DNA-binding proteins (DSBs) and single-stranded DNA-binding proteins (SSBs) in protein-DNA interfaces. In the results, we found that the interface shapes, hydrogen bonds, and the surrounding environment present significant differences between the two kinds of proteins. Built on the investigation results, we constructed a random forest (RF) classifier to distinguish DSBs and SSBs with satisfying performance. In conclusion, we present a novel methodology to characterize protein interfaces, which will deepen our understanding of the specificity of proteins binding to ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA). Proteins 2016; 84:979-989. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  9. Just-in-time adaptive classifiers-part II: designing the classifier.

    PubMed

    Alippi, Cesare; Roveri, Manuel

    2008-12-01

    Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.

  10. Classifying short genomic fragments from novel lineages using composition and homology

    PubMed Central

    2011-01-01

    Background The assignment of taxonomic attributions to DNA fragments recovered directly from the environment is a vital step in metagenomic data analysis. Assignments can be made using rank-specific classifiers, which assign reads to taxonomic labels from a predetermined level such as named species or strain, or rank-flexible classifiers, which choose an appropriate taxonomic rank for each sequence in a data set. The choice of rank typically depends on the optimal model for a given sequence and on the breadth of taxonomic groups seen in a set of close-to-optimal models. Homology-based (e.g., LCA) and composition-based (e.g., PhyloPythia, TACOA) rank-flexible classifiers have been proposed, but there is at present no hybrid approach that utilizes both homology and composition. Results We first develop a hybrid, rank-specific classifier based on BLAST and Naïve Bayes (NB) that has comparable accuracy and a faster running time than the current best approach, PhymmBL. By substituting LCA for BLAST or allowing the inclusion of suboptimal NB models, we obtain a rank-flexible classifier. This hybrid classifier outperforms established rank-flexible approaches on simulated metagenomic fragments of length 200 bp to 1000 bp and is able to assign taxonomic attributions to a subset of sequences with few misclassifications. We then demonstrate the performance of different classifiers on an enhanced biological phosphorous removal metagenome, illustrating the advantages of rank-flexible classifiers when representative genomes are absent from the set of reference genomes. Application to a glacier ice metagenome demonstrates that similar taxonomic profiles are obtained across a set of classifiers which are increasingly conservative in their classification. Conclusions Our NB-based classification scheme is faster than the current best composition-based algorithm, Phymm, while providing equally accurate predictions. The rank-flexible variant of NB, which we term ε-NB, is

  11. The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.

    PubMed

    Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas

    2014-01-01

    For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional

  12. Recognizing short coding sequences of prokaryotic genome using a novel iteratively adaptive sparse partial least squares algorithm

    PubMed Central

    2013-01-01

    Background Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process. Results For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes. In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60–100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range. The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF. Conclusions It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species. Reviewers This article was reviewed by Alexey Kondrashov, Rajeev Azad (nominated by Dr J.Peter Gogarten) and Yuriy Fofanov

  13. Avatar DNA Nanohybrid System in Chip-on-a-Phone

    NASA Astrophysics Data System (ADS)

    Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho

    2014-05-01

    Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording.

  14. Avatar DNA Nanohybrid System in Chip-on-a-Phone

    PubMed Central

    Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho

    2014-01-01

    Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording. PMID:24824876

  15. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode

  16. Lectin cDNA and transgenic plants derived therefrom

    DOEpatents

    Raikhel, Natasha V.

    2000-10-03

    Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties.

  17. matK-QR classifier: a patterns based approach for plant species identification.

    PubMed

    More, Ravi Prabhakar; Mane, Rupali Chandrashekhar; Purohit, Hemant J

    2016-01-01

    DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK ( matK ) and ribulose-1, 5-bisphosphate carboxylase ( rbcL ) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. Due to the higher discrimination success obtained with the matK as compared to the rbcL , we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK -QR Classifier (http://www.neeri.res.in/matk_classifier

  18. Exploring the read-write genome: mobile DNA and mammalian adaptation.

    PubMed

    Shapiro, James A

    2017-02-01

    The read-write genome idea predicts that mobile DNA elements will act in evolution to generate adaptive changes in organismal DNA. This prediction was examined in the context of mammalian adaptations involving regulatory non-coding RNAs, viviparous reproduction, early embryonic and stem cell development, the nervous system, and innate immunity. The evidence shows that mobile elements have played specific and sometimes major roles in mammalian adaptive evolution by generating regulatory sites in the DNA and providing interaction motifs in non-coding RNA. Endogenous retroviruses and retrotransposons have been the predominant mobile elements in mammalian adaptive evolution, with the notable exception of bats, where DNA transposons are the major agents of RW genome inscriptions. A few examples of independent but convergent exaptation of mobile DNA elements for similar regulatory rewiring functions are noted.

  19. Physics behind the mechanical nucleosome positioning code

    NASA Astrophysics Data System (ADS)

    Zuiddam, Martijn; Everaers, Ralf; Schiessel, Helmut

    2017-11-01

    The positions along DNA molecules of nucleosomes, the most abundant DNA-protein complexes in cells, are influenced by the sequence-dependent DNA mechanics and geometry. This leads to the "nucleosome positioning code", a preference of nucleosomes for certain sequence motives. Here we introduce a simplified model of the nucleosome where a coarse-grained DNA molecule is frozen into an idealized superhelical shape. We calculate the exact sequence preferences of our nucleosome model and find it to reproduce qualitatively all the main features known to influence nucleosome positions. Moreover, using well-controlled approximations to this model allows us to come to a detailed understanding of the physics behind the sequence preferences of nucleosomes.

  20. Toward a More Responsive Consumable Materiel Supply Chain: Leveraging New Metrics to Identify and Classify Items of Concern

    DTIC Science & Technology

    2016-06-01

    managed by teams organized by the four- digit Federal Supply Classification (FSC) code, which classifies a part by type of materiel. When the consumable...Command [NAVSUP], 2015a). The first four digits of the NSN comprise the FSC code, which categorizes the item being ordered; in the present example it...Table 3, requisitions are divided into three priority bins—high (TP 1), medium (TP 2), 15 and low (TP 3). A mission-critical requirement almost

  1. Characterization and machine learning prediction of allele-specific DNA methylation.

    PubMed

    He, Jianlin; Sun, Ming-an; Wang, Zhong; Wang, Qianfei; Li, Qing; Xie, Hehuang

    2015-12-01

    A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events. Published by Elsevier Inc.

  2. Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

    PubMed

    Lin, Chin; Hsu, Chia-Jung; Lou, Yu-Sheng; Yeh, Shih-Jen; Lee, Chia-Cheng; Su, Sui-Lung; Chen, Hsiang-Cheng

    2017-11-06

    Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically

  3. Categories of Code-Switching in Hispanic Communities: Untangling the Terminology. Sociolinguistic Working Paper Number 76.

    ERIC Educational Resources Information Center

    Baker, Opal Ruth

    Research on Spanish/English code switching is reviewed and the definitions and categories set up by the investigators are examined. Their methods of locating, limiting, and classifying true code switches, and the terms used and results obtained, are compared. It is found that in these studies, conversational (intra-discourse) code switching is…

  4. Mitochondrial DNA triplication and punctual mutations in patients with mitochondrial neuromuscular disorders

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mkaouar-Rebai, Emna, E-mail: emna.mkaouar@gmail.com; Felhi, Rahma; Tabebi, Mouna

    Mitochondrial diseases are a heterogeneous group of disorders caused by the impairment of the mitochondrial oxidative phosphorylation system which have been associated with various mutations of the mitochondrial DNA (mtDNA) and nuclear gene mutations. The clinical phenotypes are very diverse and the spectrum is still expanding. As brain and muscle are highly dependent on OXPHOS, consequently, neurological disorders and myopathy are common features of mtDNA mutations. Mutations in mtDNA can be classified into three categories: large-scale rearrangements, point mutations in tRNA or rRNA genes and point mutations in protein coding genes. In the present report, we screened mitochondrial genes ofmore » complex I, III, IV and V in 2 patients with mitochondrial neuromuscular disorders. The results showed the presence the pathogenic heteroplasmic m.9157G>A variation (A211T) in the MT-ATP6 gene in the first patient. We also reported the first case of triplication of 9 bp in the mitochondrial NC7 region in Africa and Tunisia, in association with the novel m.14924T>C in the MT-CYB gene in the second patient with mitochondrial neuromuscular disorder. - Highlights: • We reported 2 patients with mitochondrial neuromuscular disorders. • The heteroplasmic MT-ATP6 9157G>A variation was reported. • A triplication of 9 bp in the mitochondrial NC7 region was detected. • The m.14924T>C transition (S60P) in the MT-CYB gene was found.« less

  5. Applications of statistical physics and information theory to the analysis of DNA sequences

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  6. Novel numerical and graphical representation of DNA sequences and proteins.

    PubMed

    Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

    2006-12-01

    We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.

  7. Late night activity regarding stroke codes: LuNAR strokes.

    PubMed

    Tafreshi, Gilda; Raman, Rema; Ernstrom, Karin; Rapp, Karen; Meyer, Brett C

    2012-08-01

    There is diurnal variation for cardiac arrest and sudden cardiac death. Stroke may show a similar pattern. We assessed whether strokes presenting during a particular time of day or night are more likely of vascular etiology. To compare emergency department stroke codes arriving between 22:00 and 8:00 hours (LuNAR strokes) vs. others (n-LuNAR strokes). The purpose was to determine if late night strokes are more likely to be true strokes or warrant acute tissue plasminogen activator evaluations. We reviewed prospectively collected cases in the University of California, San Diego Stroke Team database gathered over a four-year period. Stroke codes at six emergency departments were classified based on arrival time. Those arriving between 22:00 and 8:00 hours were classified as LuNAR stroke codes, the remainder were classified as 'n-LuNAR'. Patients were further classified as intracerebral hemorrhage, acute ischemic stroke not receiving tissue plasminogen activator, acute ischemic stroke receiving tissue plasminogen activator, transient ischemic attack, and nonstroke. Categorical outcomes were compared using Fisher's Exact test. Continuous outcomes were compared using Wilcoxon's Rank-sum test. A total of 1607 patients were included in our study, of which, 299 (19%) were LuNAR code strokes. The overall median NIHSS was five, higher in the LuNAR group (n-LuNAR 5, LuNAR 7; P=0·022). There was no overall differences in patient diagnoses between LuNAR and n-LuNAR strokes (P=0·169) or diagnosis of acute ischemic stroke receiving tissue plasminogen activator (n-LuNAR 191 (14·6%), LuNAR 42 (14·0%); P=0·86). Mean arrival to computed tomography scan time was longer during LuNAR hours (n-LuNAR 54·9±76·3 min, LuNAR 62·5±87·7 min; P=0·027). There was no significant difference in 90-day mortality (n-LuNAR 15·0%, LuNAR 13·2%; P=0·45). Our stroke center experience showed no difference in diagnosis of acute ischemic stroke between day and night stroke codes. This

  8. Dynamic system classifier.

    PubMed

    Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A

    2016-07-01

    Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω(t) and damping factor γ(t). Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.

  9. Phylogeographic Differentiation of Mitochondrial DNA in Han Chinese

    PubMed Central

    Yao, Yong-Gang; Kong, Qing-Peng; Bandelt, Hans-Jürgen; Kivisild, Toomas; Zhang, Ya-Ping

    2002-01-01

    To characterize the mitochondrial DNA (mtDNA) variation in Han Chinese from several provinces of China, we have sequenced the two hypervariable segments of the control region and the segment spanning nucleotide positions 10171–10659 of the coding region, and we have identified a number of specific coding-region mutations by direct sequencing or restriction-fragment–length–polymorphism tests. This allows us to define new haplogroups (clades of the mtDNA phylogeny) and to dissect the Han mtDNA pool on a phylogenetic basis, which is a prerequisite for any fine-grained phylogeographic analysis, the interpretation of ancient mtDNA, or future complete mtDNA sequencing efforts. Some of the haplogroups under study differ considerably in frequencies across different provinces. The southernmost provinces show more pronounced contrasts in their regional Han mtDNA pools than the central and northern provinces. These and other features of the geographical distribution of the mtDNA haplogroups observed in the Han Chinese make an initial Paleolithic colonization from south to north plausible but would suggest subsequent migration events in China that mainly proceeded from north to south and east to west. Lumping together all regional Han mtDNA pools into one fictive general mtDNA pool or choosing one or two regional Han populations to represent all Han Chinese is inappropriate for prehistoric considerations as well as for forensic purposes or medical disease studies. PMID:11836649

  10. Linear and Nonlinear Statistical Characterization of DNA

    NASA Astrophysics Data System (ADS)

    Norio Oiwa, Nestor; Goldman, Carla; Glazier, James

    2002-03-01

    We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.

  11. Peculiarities of use of ECOC and AdaBoost based classifiers for thematic processing of hyperspectral data

    NASA Astrophysics Data System (ADS)

    Dementev, A. O.; Dmitriev, E. V.; Kozoderov, V. V.; Egorov, V. D.

    2017-10-01

    Hyperspectral imaging is up-to-date promising technology widely applied for the accurate thematic mapping. The presence of a large number of narrow survey channels allows us to use subtle differences in spectral characteristics of objects and to make a more detailed classification than in the case of using standard multispectral data. The difficulties encountered in the processing of hyperspectral images are usually associated with the redundancy of spectral information which leads to the problem of the curse of dimensionality. Methods currently used for recognizing objects on multispectral and hyperspectral images are usually based on standard base supervised classification algorithms of various complexity. Accuracy of these algorithms can be significantly different depending on considered classification tasks. In this paper we study the performance of ensemble classification methods for the problem of classification of the forest vegetation. Error correcting output codes and boosting are tested on artificial data and real hyperspectral images. It is demonstrates, that boosting gives more significant improvement when used with simple base classifiers. The accuracy in this case in comparable the error correcting output code (ECOC) classifier with Gaussian kernel SVM base algorithm. However the necessity of boosting ECOC with Gaussian kernel SVM is questionable. It is demonstrated, that selected ensemble classifiers allow us to recognize forest species with high enough accuracy which can be compared with ground-based forest inventory data.

  12. The logic of DNA replication in double-stranded DNA viruses: insights from global analysis of viral genomes

    PubMed Central

    Kazlauskas, Darius; Krupovic, Mart; Venclovas, Česlovas

    2016-01-01

    Abstract Genomic DNA replication is a complex process that involves multiple proteins. Cellular DNA replication systems are broadly classified into only two types, bacterial and archaeo-eukaryotic. In contrast, double-stranded (ds) DNA viruses feature a much broader diversity of DNA replication machineries. Viruses differ greatly in both completeness and composition of their sets of DNA replication proteins. In this study, we explored whether there are common patterns underlying this extreme diversity. We identified and analyzed all major functional groups of DNA replication proteins in all available proteomes of dsDNA viruses. Our results show that some proteins are common to viruses infecting all domains of life and likely represent components of the ancestral core set. These include B-family polymerases, SF3 helicases, archaeo-eukaryotic primases, clamps and clamp loaders of the archaeo-eukaryotic type, RNase H and ATP-dependent DNA ligases. We also discovered a clear correlation between genome size and self-sufficiency of viral DNA replication, the unanticipated dominance of replicative helicases and pervasive functional associations among certain groups of DNA replication proteins. Altogether, our results provide a comprehensive view on the diversity and evolution of replication systems in the DNA virome and uncover fundamental principles underlying the orchestration of viral DNA replication. PMID:27112572

  13. Dynamics of the DNA repair proteins WRN and BLM in the nucleoplasm and nucleoli.

    PubMed

    Bendtsen, Kristian Moss; Jensen, Martin Borch; May, Alfred; Rasmussen, Lene Juel; Trusina, Ala; Bohr, Vilhelm A; Jensen, Mogens H

    2014-11-01

    We have investigated the mobility of two EGFP-tagged DNA repair proteins, WRN and BLM. In particular, we focused on the dynamics in two locations, the nucleoli and the nucleoplasm. We found that both WRN and BLM use a "DNA-scanning" mechanism, with rapid binding-unbinding to DNA resulting in effective diffusion. In the nucleoplasm WRN and BLM have effective diffusion coefficients of 1.62 and 1.34 μm(2)/s, respectively. Likewise, the dynamics in the nucleoli are also best described by effective diffusion, but with diffusion coefficients a factor of ten lower than in the nucleoplasm. From this large reduction in diffusion coefficient we were able to classify WRN and BLM as DNA damage scanners. In addition to WRN and BLM we also classified other DNA damage proteins and found they all fall into one of two categories. Either they are scanners, similar to WRN and BLM, with very low diffusion coefficients, suggesting a scanning mechanism, or they are almost freely diffusing, suggesting that they interact with DNA only after initiation of a DNA damage response.

  14. Privacy-Preserving Classifier Learning

    NASA Astrophysics Data System (ADS)

    Brickell, Justin; Shmatikov, Vitaly

    We present an efficient protocol for the privacy-preserving, distributed learning of decision-tree classifiers. Our protocol allows a user to construct a classifier on a database held by a remote server without learning any additional information about the records held in the database. The server does not learn anything about the constructed classifier, not even the user’s choice of feature and class attributes.

  15. [DNA prints instead of plantar prints in neonatal identification].

    PubMed

    Rodríguez-Alarcón Gómez, J; Martińez de Pancorbo Gómez, M; Santillana Ferrer, L; Castro Espido, A; Melchor Maros, J C; Linares Uribe, M A; Fernández-Llebrez del Rey, L; Aranguren Dúo, G

    1996-06-22

    To check the possible usefulness in studying DNA in dried blood spots taken on filter paper blotters for newborn identification. It set out to establish: 1. The validity of the method for analysis; 2. The validity of all stored samples (such as those kept in clinical records); 3. Guarantee of non-intrusion in the genetic code; 4. Acceptable price and execution time. Forty (40) anonymous 13-year-old samples of 20 subjects (2 per subject) were studied. DNA was extracted using Chelex resin and the STR ("small tandem repeat") of microsatellite DNA was studies using the "polimerase chain reaction method" (PCR). Three non coding DNA loci (CSF1PO, TPOX and THO1) were analyzed by Multiplex amplification. It was possible to type 39 samples, making it possible to match the 20 cases (one by exclusion). The complete procedure yielded the results within 24 hours in all cases. The estimated final cost was found to be a fifth of that conventional maternity/paternity tests. The study carried out made matching possible in all 20 cases (directly in 19 cases). It was not necessary to study DNA coding areas. The validity of the method for analyzing samples stored for 13 years without any special care was also demonstrated. The technic was fast, producing the results within 24 hours, and at reasonable cost.

  16. Methodology for fast detection of false sharing in threaded scientific codes

    DOEpatents

    Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

    2014-11-25

    A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.

  17. Mycofier: a new machine learning-based classifier for fungal ITS sequences.

    PubMed

    Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel

    2016-08-11

    The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .

  18. Resurrection of DNA Function In Vivo from an Extinct Genome

    PubMed Central

    Pask, Andrew J.; Behringer, Richard R.; Renfree, Marilyn B.

    2008-01-01

    There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine), obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity. PMID:18493600

  19. The mitochondrial genome of the pathogenic yeast Candida subhashii: GC-rich linear DNA with a protein covalently attached to the 5′ termini

    PubMed Central

    Fricova, Dominika; Valach, Matus; Farkas, Zoltan; Pfeiffer, Ilona; Kucsera, Judit; Tomaska, Lubomir; Nosek, Jozef

    2010-01-01

    As a part of our initiative aimed at a large-scale comparative analysis of fungal mitochondrial genomes, we determined the complete DNA sequence of the mitochondrial genome of the yeast Candida subhashii and found that it exhibits a number of peculiar features. First, the mitochondrial genome is represented by linear dsDNA molecules of uniform length (29 795 bp), with an unusually high content of guanine and cytosine residues (52.7 %). Second, the coding sequences lack introns; thus, the genome has a relatively compact organization. Third, the termini of the linear molecules consist of long inverted repeats and seem to contain a protein covalently bound to terminal nucleotides at the 5′ ends. This architecture resembles the telomeres in a number of linear viral and plasmid DNA genomes classified as invertrons, in which the terminal proteins serve as specific primers for the initiation of DNA synthesis. Finally, although the mitochondrial genome of C. subhashii contains essentially the same set of genes as other closely related pathogenic Candida species, we identified additional ORFs encoding two homologues of the family B protein-priming DNA polymerases and an unknown protein. The terminal structures and the genes for DNA polymerases are reminiscent of linear mitochondrial plasmids, indicating that this genome architecture might have emerged from fortuitous recombination between an ancestral, presumably circular, mitochondrial genome and an invertron-like element. PMID:20395267

  20. A Radiation Chemistry Code Based on the Green's Function of the Diffusion Equation

    NASA Technical Reports Server (NTRS)

    Plante, Ianik; Wu, Honglu

    2014-01-01

    Stochastic radiation track structure codes are of great interest for space radiation studies and hadron therapy in medicine. These codes are used for a many purposes, notably for microdosimetry and DNA damage studies. In the last two decades, they were also used with the Independent Reaction Times (IRT) method in the simulation of chemical reactions, to calculate the yield of various radiolytic species produced during the radiolysis of water and in chemical dosimeters. Recently, we have developed a Green's function based code to simulate reversible chemical reactions with an intermediate state, which yielded results in excellent agreement with those obtained by using the IRT method. This code was also used to simulate and the interaction of particles with membrane receptors. We are in the process of including this program for use with the Monte-Carlo track structure code Relativistic Ion Tracks (RITRACKS). This recent addition should greatly expand the capabilities of RITRACKS, notably to simulate DNA damage by both the direct and indirect effect.

  1. Signatures of DNA Methylation across Insects Suggest Reduced DNA Methylation Levels in Holometabola

    PubMed Central

    Provataris, Panagiotis; Meusemann, Karen; Niehuis, Oliver; Grath, Sonja; Misof, Bernhard

    2018-01-01

    Abstract It has been experimentally shown that DNA methylation is involved in the regulation of gene expression and the silencing of transposable element activity in eukaryotes. The variable levels of DNA methylation among different insect species indicate an evolutionarily flexible role of DNA methylation in insects, which due to a lack of comparative data is not yet well-substantiated. Here, we use computational methods to trace signatures of DNA methylation across insects by analyzing transcriptomic and genomic sequence data from all currently recognized insect orders. We conclude that: 1) a functional methylation system relying exclusively on DNA methyltransferase 1 is widespread across insects. 2) DNA methylation has potentially been lost or extremely reduced in species belonging to springtails (Collembola), flies and relatives (Diptera), and twisted-winged parasites (Strepsiptera). 3) Holometabolous insects display signs of reduced DNA methylation levels in protein-coding sequences compared with hemimetabolous insects. 4) Evolutionarily conserved insect genes associated with housekeeping functions tend to display signs of heavier DNA methylation in comparison to the genomic/transcriptomic background. With this comparative study, we provide the much needed basis for experimental and detailed comparative analyses required to gain a deeper understanding on the evolution and function of DNA methylation in insects. PMID:29697817

  2. Transcriptional mapping of the ribosomal RNA region of mouse L-cell mitochondrial DNA.

    PubMed Central

    Nagley, P; Clayton, D A

    1980-01-01

    The map positions in mouse mitochondrial DNA of the two ribosomal RNA genes and adjacent genes coding several small transcripts have been determined precisely by application of a procedure in which DNA-RNA hybrids have been subjected to digestion by S1 nuclease under conditions of varying severity. Digestion of the DNA-RNA hybrids with S1 nuclease yielded a series of species which were shown to contain ribosomal RNA molecules together with adjacent transcripts hybridized conjointly to a continuous segment of mitochondrial DNA. There is one small transcript about 60 bases long whose gene adjoins the sequences coding the 5'-end of the small ribosomal RNA (950 bases) and which lies approximately 200 nucleotides from the D-loop origin of heavy strand mitochondrial DNA synthesis. An 80-base transcript lies between the small and large ribosomal RNA genes, and genes for two further short transcript (each about 80 bases in length) abut the sequences coding the 3'-end of the large ribosomal RNA (approximately 1500 bases). The ability to isolate a discrete DNA-RNA hybrid species approximately 2700 base pairs in length containing all these transcripts suggests that there can be few nucleotides in this region of mouse mitochondrial DNA which are not represented as stable RNA species. Images PMID:6253898

  3. Complete Mitochondrial DNA Analysis of Eastern Eurasian Haplogroups Rarely Found in Populations of Northern Asia and Eastern Europe

    PubMed Central

    Derenko, Miroslava; Malyarchuk, Boris; Denisova, Galina; Perkova, Maria; Rogalla, Urszula; Grzybowski, Tomasz; Khusnutdinova, Elza; Dambueva, Irina; Zakharov, Ilia

    2012-01-01

    With the aim of uncovering all of the most basal variation in the northern Asian mitochondrial DNA (mtDNA) haplogroups, we have analyzed mtDNA control region and coding region sequence variation in 98 Altaian Kazakhs from southern Siberia and 149 Barghuts from Inner Mongolia, China. Both populations exhibit the prevalence of eastern Eurasian lineages accounting for 91.9% in Barghuts and 60.2% in Altaian Kazakhs. The strong affinity of Altaian Kazakhs and populations of northern and central Asia has been revealed, reflecting both influences of central Asian inhabitants and essential genetic interaction with the Altai region indigenous populations. Statistical analyses data demonstrate a close positioning of all Mongolic-speaking populations (Mongolians, Buryats, Khamnigans, Kalmyks as well as Barghuts studied here) and Turkic-speaking Sojots, thus suggesting their origin from a common maternal ancestral gene pool. In order to achieve a thorough coverage of DNA lineages revealed in the northern Asian matrilineal gene pool, we have completely sequenced the mtDNA of 55 samples representing haplogroups R11b, B4, B5, F2, M9, M10, M11, M13, N9a and R9c1, which were pinpointed from a massive collection (over 5000 individuals) of northern and eastern Asian, as well as European control region mtDNA sequences. Applying the newly updated mtDNA tree to the previously reported northern Asian and eastern Asian mtDNA data sets has resolved the status of the poorly classified mtDNA types and allowed us to obtain the coalescence age estimates of the nodes of interest using different calibrated rates. Our findings confirm our previous conclusion that northern Asian maternal gene pool consists of predominantly post-LGM components of eastern Asian ancestry, though some genetic lineages may have a pre-LGM/LGM origin. PMID:22363811

  4. Interatomic Coulombic Decay Effects in Theoretical DNA Recombination Systems Involving Protein Interaction Sites

    NASA Astrophysics Data System (ADS)

    Vargas, E. L.; Rivas, D. A.; Duot, A. C.; Hovey, R. T.; Andrianarijaona, V. M.

    2015-03-01

    DNA replication is the basis for all biological reproduction. A strand of DNA will ``unzip'' and bind with a complimentary strand, creating two identical strands. In this study, we are considering how this process is affected by Interatomic Coulombic Decay (ICD), specifically how ICD affects the individual coding proteins' ability to hold together. ICD mainly deals with how the electron returns to its original state after excitation and how this affects its immediate atomic environment, sometimes affecting the connectivity between interaction sites on proteins involved in the DNA coding process. Biological heredity is fundamentally controlled by DNA and its replication therefore it affects every living thing. The small nature of the proteins (within the range of nanometers) makes it a good candidate for research of this scale. Understanding how ICD affects DNA molecules can give us invaluable insight into the human genetic code and the processes behind cell mutations that can lead to cancer. Authors wish to give special thanks to Pacific Union College Student Senate in Angwin, California, for their financial support.

  5. PCR-free quantitative detection of genetically modified organism from raw materials. An electrochemiluminescence-based bio bar code method.

    PubMed

    Zhu, Debin; Tang, Yabing; Xing, Da; Chen, Wei R

    2008-05-15

    A bio bar code assay based on oligonucleotide-modified gold nanoparticles (Au-NPs) provides a PCR-free method for quantitative detection of nucleic acid targets. However, the current bio bar code assay requires lengthy experimental procedures including the preparation and release of bar code DNA probes from the target-nanoparticle complex and immobilization and hybridization of the probes for quantification. Herein, we report a novel PCR-free electrochemiluminescence (ECL)-based bio bar code assay for the quantitative detection of genetically modified organism (GMO) from raw materials. It consists of tris-(2,2'-bipyridyl) ruthenium (TBR)-labeled bar code DNA, nucleic acid hybridization using Au-NPs and biotin-labeled probes, and selective capture of the hybridization complex by streptavidin-coated paramagnetic beads. The detection of target DNA is realized by direct measurement of ECL emission of TBR. It can quantitatively detect target nucleic acids with high speed and sensitivity. This method can be used to quantitatively detect GMO fragments from real GMO products.

  6. Delimitation of essential genes of cassava latent virus DNA 2.

    PubMed Central

    Etessami, P; Callis, R; Ellwood, S; Stanley, J

    1988-01-01

    Insertion and deletion mutagenesis of both extended open reading frames (ORFs) of cassava latent virus DNA 2 destroys infectivity. Infectivity is restored by coinoculating constructs that contain single mutations within different ORFs. Although frequent intermolecular recombination produces dominant parental-type virus, mutants can be retained within the virus population indicating that they are competent for replication and suggesting that rescue can occur by complementation of trans acting gene products. By cloning specific fragments into DNA 1 coat protein deletion vectors we have delimited the DNA 2 coding regions and provide substantive evidence that both are essential for virus infection. Although a DNA 2 component is unique to whitefly-transmitted geminiviruses, the results demonstrate that neither coding region is involved solely in insect transmission. The requirement for a bipartite genome for whitefly-transmitted geminiviruses is discussed. Images PMID:3387209

  7. Nanoarchitectonics with Porphyrin Functionalized DNA

    PubMed Central

    2017-01-01

    Conspectus DNA is well-known as bearer of the genetic code. Since its structure elucidation nearly seven decades ago by Watson, Crick, Wilkins, and Franklin, much has been learned about its detailed structure, function, and genetic coding. The development of automated solid-phase synthesis, and with it the availability of synthetic DNA with any desired sequence in lengths of up to hundreds of bases in the best case, has contributed much to the advancement of the field of DNA research. In addition, classic organic synthesis has allowed introduction of a very large number of modifications in the DNA in a sequence specific manner, which have initially been targeted at altering the biological function of DNA. However, in recent years DNA has become a very attractive scaffold in supramolecular chemistry, where DNA is taken out of its biological role and serves as both stick and glue molecule to assemble novel functional structures with nanometer precision. The attachment of functionalities to DNA has led to the creation of supramolecular systems with applications in light harvesting, energy and electron transfer, sensing, and catalysis. Functional DNA is clearly having a significant impact in the field of bioinspired nanosystems. Of particular interest is the use of porphyrins in supramolecular chemistry and bionanotechnology, because they are excellent functional groups due to their electronic properties that can be tailored through chemical modifications of the aromatic core or through insertion of almost any metal of the periodic table into the central cavity. The porphyrins can be attached either to the nucleobase, to the phosphate group, or to the ribose moiety. Additionally, noncovalent templating through Watson–Crick base pairing forms an alternative and attractive approach. With this, the combination of two seemingly simple molecules gives rise to a highly complex system with unprecedented possibilities for modulation of function, and with it applications

  8. The logic of DNA replication in double-stranded DNA viruses: insights from global analysis of viral genomes.

    PubMed

    Kazlauskas, Darius; Krupovic, Mart; Venclovas, Česlovas

    2016-06-02

    Genomic DNA replication is a complex process that involves multiple proteins. Cellular DNA replication systems are broadly classified into only two types, bacterial and archaeo-eukaryotic. In contrast, double-stranded (ds) DNA viruses feature a much broader diversity of DNA replication machineries. Viruses differ greatly in both completeness and composition of their sets of DNA replication proteins. In this study, we explored whether there are common patterns underlying this extreme diversity. We identified and analyzed all major functional groups of DNA replication proteins in all available proteomes of dsDNA viruses. Our results show that some proteins are common to viruses infecting all domains of life and likely represent components of the ancestral core set. These include B-family polymerases, SF3 helicases, archaeo-eukaryotic primases, clamps and clamp loaders of the archaeo-eukaryotic type, RNase H and ATP-dependent DNA ligases. We also discovered a clear correlation between genome size and self-sufficiency of viral DNA replication, the unanticipated dominance of replicative helicases and pervasive functional associations among certain groups of DNA replication proteins. Altogether, our results provide a comprehensive view on the diversity and evolution of replication systems in the DNA virome and uncover fundamental principles underlying the orchestration of viral DNA replication. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Local classifier weighting by quadratic programming.

    PubMed

    Cevikalp, Hakan; Polikar, Robi

    2008-10-01

    It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.

  10. Development of code evaluation criteria for assessing predictive capability and performance

    NASA Technical Reports Server (NTRS)

    Lin, Shyi-Jang; Barson, S. L.; Sindir, M. M.; Prueger, G. H.

    1993-01-01

    Computational Fluid Dynamics (CFD), because of its unique ability to predict complex three-dimensional flows, is being applied with increasing frequency in the aerospace industry. Currently, no consistent code validation procedure is applied within the industry. Such a procedure is needed to increase confidence in CFD and reduce risk in the use of these codes as a design and analysis tool. This final contract report defines classifications for three levels of code validation, directly relating the use of CFD codes to the engineering design cycle. Evaluation criteria by which codes are measured and classified are recommended and discussed. Criteria for selecting experimental data against which CFD results can be compared are outlined. A four phase CFD code validation procedure is described in detail. Finally, the code validation procedure is demonstrated through application of the REACT CFD code to a series of cases culminating in a code to data comparison on the Space Shuttle Main Engine High Pressure Fuel Turbopump Impeller.

  11. Physical interactions between bacteriophage and Escherichia coli proteins required for initiation of lambda DNA replication.

    PubMed

    Liberek, K; Osipiuk, J; Zylicz, M; Ang, D; Skorko, J; Georgopoulos, C

    1990-02-25

    The process of initiation of lambda DNA replication requires the assembly of the proper nucleoprotein complex at the origin of replication, ori lambda. The complex is composed of both phage and host-coded proteins. The lambda O initiator protein binds specifically to ori lambda. The lambda P initiator protein binds to both lambda O and the host-coded dnaB helicase, giving rise to an ori lambda DNA.lambda O.lambda P.dnaB structure. The dnaK and dnaJ heat shock proteins have been shown capable of dissociating this complex. The thus freed dnaB helicase unwinds the duplex DNA template at the replication fork. In this report, through cross-linking, size chromatography, and protein affinity chromatography, we document some of the protein-protein interactions occurring at ori lambda. Our results show that the dnaK protein specifically interacts with both lambda O and lambda P, and that the dnaJ protein specifically interacts with the dnaB helicase.

  12. Classifying transcription factor targets and discovering relevant biological features

    PubMed Central

    Holloway, Dustin T; Kon, Mark; DeLisi, Charles

    2008-01-01

    Background An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. Principal Findings (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related

  13. Visualization of yeast chromosomal DNA

    NASA Technical Reports Server (NTRS)

    Lubega, Seth

    1990-01-01

    The DNA molecule is the most significant life molecule since it codes the blue print for other structural and functional molecules of all living organisms. Agarose gel electrophoresis is now being widely used to separate DNA of virus, bacteria, and lower eukaryotes. The task was undertaken of reviewing the existing methods of DNA fractionation and microscopic visualization of individual chromosonal DNA molecules by gel electrophoresis as a basis for a proposed study to investigate the feasibility of separating DNA molecules in free fluids as an alternative to gel electrophoresis. Various techniques were studied. On the molecular level, agarose gel electrophoresis is being widely used to separate chromosomal DNA according to molecular weight. Carl and Olson separate and characterized the entire karyotype of a lab strain of Saccharomyces cerevisiae. Smith et al. and Schwartz and Koval independently reported the visualization of individual DNA molecules migrating through agarose gel matrix during electrophoresis. The techniques used by these researchers are being reviewed in the lab as a basis for the proposed studies.

  14. Quantum ensembles of quantum classifiers.

    PubMed

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  15. A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.

    PubMed

    Marucci-Wellman, H; Lehto, M; Corns, H

    2011-12-01

    Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review. Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding. When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories. A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.

  16. The diabolo classifier

    PubMed

    Schwenk

    1998-11-15

    We present a new classification architecture based on autoassociative neural networks that are used to learn discriminant models of each class. The proposed architecture has several interesting properties with respect to other model-based classifiers like nearest-neighbors or radial basis functions: it has a low computational complexity and uses a compact distributed representation of the models. The classifier is also well suited for the incorporation of a priori knowledge by means of a problem-specific distance measure. In particular, we will show that tangent distance (Simard, Le Cun, & Denker, 1993) can be used to achieve transformation invariance during learning and recognition. We demonstrate the application of this classifier to optical character recognition, where it has achieved state-of-the-art results on several reference databases. Relations to other models, in particular those based on principal component analysis, are also discussed.

  17. Low-energy electron dose-point kernel simulations using new physics models implemented in Geant4-DNA

    NASA Astrophysics Data System (ADS)

    Bordes, Julien; Incerti, Sébastien; Lampe, Nathanael; Bardiès, Manuel; Bordage, Marie-Claude

    2017-05-01

    When low-energy electrons, such as Auger electrons, interact with liquid water, they induce highly localized ionizing energy depositions over ranges comparable to cell diameters. Monte Carlo track structure (MCTS) codes are suitable tools for performing dosimetry at this level. One of the main MCTS codes, Geant4-DNA, is equipped with only two sets of cross section models for low-energy electron interactions in liquid water (;option 2; and its improved version, ;option 4;). To provide Geant4-DNA users with new alternative physics models, a set of cross sections, extracted from CPA100 MCTS code, have been added to Geant4-DNA. This new version is hereafter referred to as ;Geant4-DNA-CPA100;. In this study, ;Geant4-DNA-CPA100; was used to calculate low-energy electron dose-point kernels (DPKs) between 1 keV and 200 keV. Such kernels represent the radial energy deposited by an isotropic point source, a parameter that is useful for dosimetry calculations in nuclear medicine. In order to assess the influence of different physics models on DPK calculations, DPKs were calculated using the existing Geant4-DNA models (;option 2; and ;option 4;), newly integrated CPA100 models, and the PENELOPE Monte Carlo code used in step-by-step mode for monoenergetic electrons. Additionally, a comparison was performed of two sets of DPKs that were simulated with ;Geant4-DNA-CPA100; - the first set using Geant4‧s default settings, and the second using CPA100‧s original code default settings. A maximum difference of 9.4% was found between the Geant4-DNA-CPA100 and PENELOPE DPKs. Between the two Geant4-DNA existing models, slight differences, between 1 keV and 10 keV were observed. It was highlighted that the DPKs simulated with the two Geant4-DNA's existing models were always broader than those generated with ;Geant4-DNA-CPA100;. The discrepancies observed between the DPKs generated using Geant4-DNA's existing models and ;Geant4-DNA-CPA100; were caused solely by their different cross

  18. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology.

    PubMed

    Heinson, Ashley I; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen C Denman; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher H

    2017-02-01

    Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  19. Pacific Northwest (PNW) Hydrologic Landscape (HL) polygons and HL code

    EPA Pesticide Factsheets

    A five-letter hydrologic landscape code representing five indices of hydrologic form that are related to hydrologic function: climate, seasonality, aquifer permeability, terrain, and soil permeability. Each hydrologic assessment unit is classified by one of the 81 different five-letter codes representing these indices. Polygon features in this dataset were created by aggregating (dissolving boundaries between) adjacent, similarly-coded hydrologic assessment units. Climate Classes: V-Very wet, W-Wet, M-Moist, D-Dry, S-Semiarid, A-Arid. Seasonality Sub-Classes: w-Fall or winter, s-Spring. Aquifer Permeability Classes: H-High, L-Low. Terrain Classes: M-Mountain, T-Transitional, F-Flat. Soil Permeability Classes: H-High, L-Low.

  20. 28 CFR 701.14 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified information. 701.14 Section... UNDER THE FREEDOM OF INFORMATION ACT § 701.14 Classified information. In processing a request for information that is classified or classifiable under Executive Order 12356 or any other Executive Order...

  1. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  2. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  3. Child Injury Deaths: Comparing Prevention Information from Two Coding Systems

    PubMed Central

    Schnitzer, Patricia G.; Ewigman, Bernard G.

    2006-01-01

    Objectives The International Classification of Disease (ICD) external cause of injury E-codes do not sufficiently identify injury circumstances amenable to prevention. The researchers developed an alternative classification system (B-codes) that incorporates behavioral and environmental factors, for use in childhood injury research, and compare the two coding systems in this paper. Methods All fatal injuries among children less than age five that occurred between January 1, 1992, and December 31, 1994, were classified using both B-codes and E-codes. Results E-codes identified the most common causes of injury death: homicide (24%), fires (21%), motor vehicle incidents (21%), drowning (10%), and suffocation (9%). The B-codes further revealed that homicides (51%) resulted from the child being shaken or struck by another person; many fires deaths (42%) resulted from children playing with matches or lighters; drownings (46%) usually occurred in natural bodies of water; and most suffocation deaths (68%) occurred in unsafe sleeping arrangements. Conclusions B-codes identify additional information with specific relevance for prevention of childhood injuries. PMID:15944169

  4. Nanogravimetric and voltammetric DNA-hybridization biosensors for studies of DNA damage by common toxicants and pollutants.

    PubMed

    Nowicka, Anna M; Kowalczyk, Agata; Stojek, Zbigniew; Hepel, Maria

    2010-01-01

    Electrochemical and nanogravimetric DNA-hybridization biosensors have been developed for sensing single mismatches in the probe-target ssDNA sequences. The voltammetric transduction was achieved by coupling ferrocene moiety to streptavidin linked to biotinylated tDNA. The mass-related frequency transduction was implemented by immobilizing the sensory pDNA on a gold-coated quartz crystal piezoresonators oscillating in the 10MHz band. The high sensitivity of these sensors enabled us to study DNA damage caused by representative toxicants and environmental pollutants, including Cr(VI) species, common pesticides and herbicides. We have found that the sensor responds rapidly to any damage caused by Cr(VI) species, with more severe DNA damage observed for Cr(2)O(7)(2-) and for CrO(4)(2-) in the presence of H(2)O(2) as compared to CrO(4)(2-) alone. All herbicides and pesticides examined caused DNA damage or structural alterations leading to the double-helix unwinding. Among these compounds, paraoxon-ethyl and atrazine caused the fastest and most severe damage to DNA. The physico-chemical mechanism of damaging interactions between toxicants and DNA has been proposed. The methodology of testing voltammetric and nanogravimetric DNA-hybridization biosensors developed in this work can be employed as a simple protocol to obtain rapid comparative data concerning DNA damage caused by herbicide, pesticides and other toxic pollutants. The DNA-hybridization biosensor can, therefore, be utilized as a rapid screening device for classifying environmental pollutants and to evaluate DNA damage induced by these compounds.

  5. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    NASA Astrophysics Data System (ADS)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  6. Circular replication-associated protein encoding DNA viruses identified in the faecal matter of various animals in New Zealand.

    PubMed

    Steel, Olivia; Kraberger, Simona; Sikorski, Alyssa; Young, Laura M; Catchpole, Ryan J; Stevens, Aaron J; Ladley, Jenny J; Coray, Dorien S; Stainton, Daisy; Dayaram, Anisha; Julian, Laurel; van Bysterveldt, Katherine; Varsani, Arvind

    2016-09-01

    In recent years, innovations in molecular techniques and sequencing technologies have resulted in a rapid expansion in the number of known viral sequences, in particular those with circular replication-associated protein (Rep)-encoding single-stranded (CRESS) DNA genomes. CRESS DNA viruses are present in the virome of many ecosystems and are known to infect a wide range of organisms. A large number of the recently identified CRESS DNA viruses cannot be classified into any known viral families, indicating that the current view of CRESS DNA viral sequence space is greatly underestimated. Animal faecal matter has proven to be a particularly useful source for sampling CRESS DNA viruses in an ecosystem, as it is cost-effective and non-invasive. In this study a viral metagenomic approach was used to explore the diversity of CRESS DNA viruses present in the faeces of domesticated and wild animals in New Zealand. Thirty-eight complete CRESS DNA viral genomes and two circular molecules (that may be defective molecules or single components of multicomponent genomes) were identified from forty-nine individual animal faecal samples. Based on shared genome organisations and sequence similarities, eighteen of the isolates were classified as gemycircularviruses and twelve isolates were classified as smacoviruses. The remaining eight isolates lack significant sequence similarity with any members of known CRESS DNA virus groups. This research adds significantly to our knowledge of CRESS DNA viral diversity in New Zealand, emphasising the prevalence of CRESS DNA viruses in nature, and reinforcing the suggestion that a large proportion of CRESS DNA viruses are yet to be identified. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Do plant cell walls have a code?

    PubMed

    Tavares, Eveline Q P; Buckeridge, Marcos S

    2015-12-01

    A code is a set of rules that establish correspondence between two worlds, signs (consisting of encrypted information) and meaning (of the decrypted message). A third element, the adaptor, connects both worlds, assigning meaning to a code. We propose that a Glycomic Code exists in plant cell walls where signs are represented by monosaccharides and phenylpropanoids and meaning is cell wall architecture with its highly complex association of polymers. Cell wall biosynthetic mechanisms, structure, architecture and properties are addressed according to Code Biology perspective, focusing on how they oppose to cell wall deconstruction. Cell wall hydrolysis is mainly focused as a mechanism of decryption of the Glycomic Code. Evidence for encoded information in cell wall polymers fine structure is highlighted and the implications of the existence of the Glycomic Code are discussed. Aspects related to fine structure are responsible for polysaccharide packing and polymer-polymer interactions, affecting the final cell wall architecture. The question whether polymers assembly within a wall display similar properties as other biological macromolecules (i.e. proteins, DNA, histones) is addressed, i.e. do they display a code? Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  8. DNATCO: assignment of DNA conformers at dnatco.org.

    PubMed

    Černý, Jiří; Božíková, Paulína; Schneider, Bohdan

    2016-07-08

    The web service DNATCO (dnatco.org) classifies local conformations of DNA molecules beyond their traditional sorting to A, B and Z DNA forms. DNATCO provides an interface to robust algorithms assigning conformation classes called NTC: to dinucleotides extracted from DNA-containing structures uploaded in PDB format version 3.1 or above. The assigned dinucleotide NTC: classes are further grouped into DNA structural alphabet NTA: , to the best of our knowledge the first DNA structural alphabet. The results are presented at two levels: in the form of user friendly visualization and analysis of the assignment, and in the form of a downloadable, more detailed table for further analysis offline. The website is free and open to all users and there is no login requirement. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Leber Hereditary Optic Neuropathy: Exemplar of an mtDNA Disease.

    PubMed

    Wallace, Douglas C; Lott, Marie T

    2017-01-01

    The report in 1988 that Leber Hereditary Optic Neuropathy (LHON) was the product of mitochondrial DNA (mtDNA) mutations provided the first demonstration of the clinical relevance of inherited mtDNA variation. From LHON studies, the medical importance was demonstrated for the mtDNA showing its coding for the most important energy genes, its maternal inheritance, its high mutation rate, its presence in hundreds to thousands of copies per cell, its quantitatively segregation of biallelic genotypes during both mitosis and meiosis, its preferential effect on the most energetic tissues including the eye and brain, its wide range of functional polymorphisms that predispose to common diseases, and its accumulation of mutations within somatic tissues providing the aging clock. These features of mtDNA genetics, in combination with the genetics of the 1-2000 nuclear DNA (nDNA) coded mitochondrial genes, is not only explaining the genetics of LHON but also providing a model for understanding the complexity of many common diseases. With the maturation of LHON biology and genetics, novel animal models for complex disease have been developed and new therapeutic targets and strategies envisioned, both pharmacological and genetic. Multiple somatic gene therapy approaches are being developed for LHON which are applicable to other mtDNA diseases. Moreover, the unique cytoplasmic genetics of the mtDNA has permitted the first successful human germline gene therapy via spindle nDNA transfer from mtDNA mutant oocytes to enucleated normal mtDNA oocytes. Such LHON lessons are actively being applied to common ophthalmological diseases like glaucoma and neurological diseases like Parkinsonism.

  10. A Perspective on DNA Microarrays in Pathology Research and Practice

    PubMed Central

    Pollack, Jonathan R.

    2007-01-01

    DNA microarray technology matured in the mid-1990s, and the past decade has witnessed a tremendous growth in its application. DNA microarrays have provided powerful tools for pathology researchers seeking to describe, classify, and understand human disease. There has also been great expectation that the technology would advance the practice of pathology. This review highlights some of the key contributions of DNA microarrays to experimental pathology, focusing in the area of cancer research. Also discussed are some of the current challenges in translating utility to clinical practice. PMID:17600117

  11. Classifying Motion.

    ERIC Educational Resources Information Center

    Duzen, Carl; And Others

    1992-01-01

    Presents a series of activities that utilizes a leveling device to classify constant and accelerated motion. Applies this classification system to uniform circular motion and motion produced by gravitational force. (MDH)

  12. Genetic coding and gene expression - new Quadruplet genetic coding model

    NASA Astrophysics Data System (ADS)

    Shankar Singh, Rama

    2012-07-01

    Successful demonstration of human genome project has opened the door not only for developing personalized medicine and cure for genetic diseases, but it may also answer the complex and difficult question of the origin of life. It may lead to making 21st century, a century of Biological Sciences as well. Based on the central dogma of Biology, genetic codons in conjunction with tRNA play a key role in translating the RNA bases forming sequence of amino acids leading to a synthesized protein. This is the most critical step in synthesizing the right protein needed for personalized medicine and curing genetic diseases. So far, only triplet codons involving three bases of RNA, transcribed from DNA bases, have been used. Since this approach has several inconsistencies and limitations, even the promise of personalized medicine has not been realized. The new Quadruplet genetic coding model proposed and developed here involves all four RNA bases which in conjunction with tRNA will synthesize the right protein. The transcription and translation process used will be the same, but the Quadruplet codons will help overcome most of the inconsistencies and limitations of the triplet codes. Details of this new Quadruplet genetic coding model and its subsequent potential applications including relevance to the origin of life will be presented.

  13. Selfish DNA in protein-coding genes of Rickettsia.

    PubMed

    Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M

    2000-10-13

    Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.

  14. An alternative method for cDNA cloning from surrogate eukaryotic cells transfected with the corresponding genomic DNA.

    PubMed

    Hu, Lin-Yong; Cui, Chen-Chen; Song, Yu-Jie; Wang, Xiang-Guo; Jin, Ya-Ping; Wang, Ai-Hua; Zhang, Yong

    2012-07-01

    cDNA is widely used in gene function elucidation and/or transgenics research but often suitable tissues or cells from which to isolate mRNA for reverse transcription are unavailable. Here, an alternative method for cDNA cloning is described and tested by cloning the cDNA of human LALBA (human alpha-lactalbumin) from genomic DNA. First, genomic DNA containing all of the coding exons was cloned from human peripheral blood and inserted into a eukaryotic expression vector. Next, by delivering the plasmids into either 293T or fibroblast cells, surrogate cells were constructed. Finally, the total RNA was extracted from the surrogate cells and cDNA was obtained by RT-PCR. The human LALBA cDNA that was obtained was compared with the corresponding mRNA published in GenBank. The comparison showed that the two sequences were identical. The novel method for cDNA cloning from surrogate eukaryotic cells described here uses well-established techniques that are feasible and simple to use. We anticipate that this alternative method will have widespread applications.

  15. Nuclear and chloroplast DNA differentiation in Andean potatoes.

    PubMed

    Sukhotu, Thitaporn; Kamijima, Osamu; Hosaka, Kazuyoshi

    2004-02-01

    Over 3500 accessions of Andean landraces have been known in potato, classified into 7 cultivated species ranging from 2x to 5x (Hawkes 1990). Chloroplast DNA (ctDNA), distinguished into T, W, C, S, and A types, showed extensive overlaps in their frequencies among cultivated species and between cultivated and putative ancestral wild species. In this study, 76 accessions of cultivated and 19 accessions of wild species were evaluated for ctDNA types and examined by ctDNA high-resolution markers (ctDNA microsatellites and H3 marker) and nuclear DNA restriction fragment length polymorphisms (RFLPs). ctDNA high-resolution markers identified 25 different ctDNA haplotypes. The S- and A-type ctDNAs were discriminated as unique haplotypes from 12 haplotypes having C-type ctDNA and T-type ctDNA from 10 haplotypes having W-type ctDNA. Differences among ctDNA types were strongly correlated with those of ctDNA high-resolution markers (r = 0.822). Differentiation between W-type ctDNA and C-, S-, and A-type ctDNAs was supported by nDNA RFLPs in most species except for those of recent or immediate hybrid origin. However, differentiation among C-, S-, and A-type ctDNAs was not clearly supported by nDNA RFLPs, suggesting that frequent genetic exchange occurred among them and (or) they shared the same gene pool owing to common ancestry.

  16. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    PubMed

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  17. LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.

    PubMed

    Pian, Cong; Zhang, Guangle; Chen, Zhi; Chen, Yuanyuan; Zhang, Jin; Yang, Tao; Zhang, Liangyun

    2016-01-01

    As a novel class of noncoding RNAs, long noncoding RNAs (lncRNAs) have been verified to be associated with various diseases. As large scale transcripts are generated every year, it is significant to accurately and quickly identify lncRNAs from thousands of assembled transcripts. To accurately discover new lncRNAs, we develop a classification tool of random forest (RF) named LncRNApred based on a new hybrid feature. This hybrid feature set includes three new proposed features, which are MaxORF, RMaxORF and SNR. LncRNApred is effective for classifying lncRNAs and protein coding transcripts accurately and quickly. Moreover,our RF model only requests the training using data on human coding and non-coding transcripts. Other species can also be predicted by using LncRNApred. The result shows that our method is more effective compared with the Coding Potential Calculate (CPC). The web server of LncRNApred is available for free at http://mm20132014.wicp.net:57203/LncRNApred/home.jsp.

  18. Development and evaluation of a Naïve Bayesian model for coding causation of workers' compensation claims.

    PubMed

    Bertke, S J; Meyers, A R; Wurzelbacher, S J; Bell, J; Lampl, M L; Robins, D

    2012-12-01

    Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, identifying the cause of injuries and illnesses in large datasets such as workers' compensation systems often requires reading and coding the free form accident text narrative for potentially millions of records. To alleviate the need for manual coding, this paper describes and evaluates a computer auto-coding algorithm that demonstrated the ability to code millions of claims quickly and accurately by learning from a set of previously manually coded claims. The auto-coding program was able to code claims as a musculoskeletal disorders, STF or other with approximately 90% accuracy. The program developed and discussed in this paper provides an accurate and efficient method for identifying the causation of workers' compensation claims as a STF or MSD in a large database based on the unstructured text narrative and resulting injury diagnoses. The program coded thousands of claims in minutes. The method described in this paper can be used by researchers and practitioners to relieve the manual burden of reading and identifying the causation of claims as a STF or MSD. Furthermore, the method can be easily generalized to code/classify other unstructured text narratives. Published by Elsevier Ltd.

  19. Alterations in sperm DNA methylation, non-coding RNA expression, and histone retention mediate vinclozolin-induced epigenetic transgenerational inheritance of disease.

    PubMed

    Ben Maamar, Millissia; Sadler-Riggleman, Ingrid; Beck, Daniel; McBirney, Margaux; Nilsson, Eric; Klukovich, Rachel; Xie, Yeming; Tang, Chong; Yan, Wei; Skinner, Michael K

    2018-04-01

    Epigenetic transgenerational inheritance of disease and phenotypic variation can be induced by several toxicants, such as vinclozolin. This phenomenon can involve DNA methylation, non-coding RNA (ncRNA) and histone retention, and/or modification in the germline (e.g. sperm). These different epigenetic marks are called epimutations and can transmit in part the transgenerational phenotypes. This study was designed to investigate the vinclozolin-induced concurrent alterations of a number of different epigenetic factors, including DNA methylation, ncRNA, and histone retention in rat sperm. Gestating females (F0 generation) were exposed transiently to vinclozolin during fetal gonadal development. The directly exposed F1 generation fetus, the directly exposed germline within the fetus that will generate the F2 generation, and the transgenerational F3 generation sperm were studied. DNA methylation and ncRNA were altered in each generation rat sperm with the direct exposure F1 and F2 generations being distinct from the F3 generation epimutations. Interestingly, an increased number of differential histone retention sites were found in the F3 generation vinclozolin sperm, but not in the F1 or F2 generations. All three different epimutation types were affected in the vinclozolin lineage transgenerational sperm (F3 generation). The direct exposure generations (F1 and F2) epigenetic alterations were distinct from the transgenerational sperm epimutations. The genomic features and gene pathways associated with the epimutations were investigated to help elucidate the integration of these different epigenetic processes. Our results show that the three different types of epimutations are involved and integrated in the mediation of the epigenetic transgenerational inheritance phenomenon.

  20. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  1. Altruistic functions for selfish DNA.

    PubMed

    Faulkner, Geoffrey J; Carninci, Piero

    2009-09-15

    Mammalian genomes are comprised of 30-50% transposed elements (TEs). The vast majority of these TEs are truncated and mutated fragments of retrotransposons that are no longer capable of transposition. Although initially regarded as important factors in the evolution of gene regulatory networks, TEs are now commonly perceived as neutrally evolving and non-functional genomic elements. In a major development, recent works have strongly contradicted this "selfish DNA" or "junk DNA" dogma by demonstrating that TEs use a host of novel promoters to generate RNA on a massive scale across most eukaryotic cells. This transcription frequently functions to control the expression of protein-coding genes via alternative promoters, cis regulatory non protein-coding RNAs and the formation of double stranded short RNAs. If considered in sum, these findings challenge the designation of TEs as selfish and neutrally evolving genomic elements. Here, we will expand upon these themes and discuss challenges in establishing novel TE functions in vivo.

  2. Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach.

    PubMed

    Surkis, Alisa; Hogle, Janice A; DiazGranados, Deborah; Hunt, Joe D; Mazmanian, Paul E; Connors, Emily; Westaby, Kate; Whipple, Elizabeth C; Adamus, Trisha; Mueller, Meridith; Aphinyanaphongs, Yindalon

    2016-08-05

    Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications. Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier. The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating

  3. Noncoding RNAs in DNA Repair and Genome Integrity

    PubMed Central

    Wan, Guohui; Liu, Yunhua; Han, Cecil; Zhang, Xinna

    2014-01-01

    Abstract Significance: The well-studied sequences in the human genome are those of protein-coding genes, which account for only 1%–2% of the total genome. However, with the advent of high-throughput transcriptome sequencing technology, we now know that about 90% of our genome is extensively transcribed and that the vast majority of them are transcribed into noncoding RNAs (ncRNAs). It is of great interest and importance to decipher the functions of these ncRNAs in humans. Recent Advances: In the last decade, it has become apparent that ncRNAs play a crucial role in regulating gene expression in normal development, in stress responses to internal and environmental stimuli, and in human diseases. Critical Issues: In addition to those constitutively expressed structural RNA, such as ribosomal and transfer RNAs, regulatory ncRNAs can be classified as microRNAs (miRNAs), Piwi-interacting RNAs (piRNAs), small interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs), and long noncoding RNAs (lncRNAs). However, little is known about the biological features and functional roles of these ncRNAs in DNA repair and genome instability, although a number of miRNAs and lncRNAs are regulated in the DNA damage response. Future Directions: A major goal of modern biology is to identify and characterize the full profile of ncRNAs with regard to normal physiological functions and roles in human disorders. Clinically relevant ncRNAs will also be evaluated and targeted in therapeutic applications. Antioxid. Redox Signal. 20, 655–677. PMID:23879367

  4. The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.

    PubMed

    Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L

    2003-11-01

    Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.

  5. Transcription initiation complex structures elucidate DNA opening.

    PubMed

    Plaschka, C; Hantsche, M; Dienemann, C; Burzinski, C; Plitzko, J; Cramer, P

    2016-05-19

    Transcription of eukaryotic protein-coding genes begins with assembly of the RNA polymerase (Pol) II initiation complex and promoter DNA opening. Here we report cryo-electron microscopy (cryo-EM) structures of yeast initiation complexes containing closed and open DNA at resolutions of 8.8 Å and 3.6 Å, respectively. DNA is positioned and retained over the Pol II cleft by a network of interactions between the TATA-box-binding protein TBP and transcription factors TFIIA, TFIIB, TFIIE, and TFIIF. DNA opening occurs around the tip of the Pol II clamp and the TFIIE 'extended winged helix' domain, and can occur in the absence of TFIIH. Loading of the DNA template strand into the active centre may be facilitated by movements of obstructing protein elements triggered by allosteric binding of the TFIIE 'E-ribbon' domain. The results suggest a unified model for transcription initiation with a key event, the trapping of open promoter DNA by extended protein-protein and protein-DNA contacts.

  6. A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases.

    PubMed

    Iyer, Lakshminarayan M; Abhiman, Saraswathi; Aravind, L

    2008-10-04

    Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases.

  7. Probabilistic classifiers with high-dimensional data

    PubMed Central

    Kim, Kyung In; Simon, Richard

    2011-01-01

    For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946

  8. Non-coding RNAs in lung cancer

    PubMed Central

    Ricciuti, Biagio; Mecca, Carmen; Crinò, Lucio; Baglivo, Sara; Cenci, Matteo; Metro, Giulio

    2014-01-01

    The discovery that protein-coding genes represent less than 2% of all human genome, and the evidence that more than 90% of it is actively transcribed, changed the classical point of view of the central dogma of molecular biology, which was always based on the assumption that RNA functions mainly as an intermediate bridge between DNA sequences and protein synthesis machinery. Accumulating data indicates that non-coding RNAs are involved in different physiological processes, providing for the maintenance of cellular homeostasis. They are important regulators of gene expression, cellular differentiation, proliferation, migration, apoptosis, and stem cell maintenance. Alterations and disruptions of their expression or activity have increasingly been associated with pathological changes of cancer cells, this evidence and the prospect of using these molecules as diagnostic markers and therapeutic targets, make currently non-coding RNAs among the most relevant molecules in cancer research. In this paper we will provide an overview of non-coding RNA function and disruption in lung cancer biology, also focusing on their potential as diagnostic, prognostic and predictive biomarkers. PMID:25593996

  9. Extraction of High Quality DNA from Seized Moroccan Cannabis Resin (Hashish)

    PubMed Central

    El Alaoui, Moulay Abdelaziz; Melloul, Marouane; Alaoui Amine, Sanaâ; Stambouli, Hamid; El Bouri, Aziz; Soulaymani, Abdelmajid; El Fahime, Elmostafa

    2013-01-01

    The extraction and purification of nucleic acids is the first step in most molecular biology analysis techniques. The objective of this work is to obtain highly purified nucleic acids derived from Cannabis sativa resin seizure in order to conduct a DNA typing method for the individualization of cannabis resin samples. To obtain highly purified nucleic acids from cannabis resin (Hashish) free from contaminants that cause inhibition of PCR reaction, we have tested two protocols: the CTAB protocol of Wagner and a CTAB protocol described by Somma (2004) adapted for difficult matrix. We obtained high quality genomic DNA from 8 cannabis resin seizures using the adapted protocol. DNA extracted by the Wagner CTAB protocol failed to give polymerase chain reaction (PCR) amplification of tetrahydrocannabinolic acid (THCA) synthase coding gene. However, the extracted DNA by the second protocol permits amplification of THCA synthase coding gene using different sets of primers as assessed by PCR. We describe here for the first time the possibility of DNA extraction from (Hashish) resin derived from Cannabis sativa. This allows the use of DNA molecular tests under special forensic circumstances. PMID:24124454

  10. 14 CFR 1216.317 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Classified information. 1216.317 Section 1216.317 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ENVIRONMENTAL QUALITY... Classified information. Environmental assessments and impact statements which contain classified information...

  11. 14 CFR 1216.317 - Classified information.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 5 2011-01-01 2010-01-01 true Classified information. 1216.317 Section 1216.317 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ENVIRONMENTAL QUALITY... Classified information. Environmental assessments and impact statements which contain classified information...

  12. 14 CFR 1216.317 - Classified information.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 5 2012-01-01 2012-01-01 false Classified information. 1216.317 Section 1216.317 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ENVIRONMENTAL QUALITY... Classified information. Environmental assessments and impact statements which contain classified information...

  13. MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

    PubMed

    Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

    2016-01-04

    Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using

  15. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  16. Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

    Treesearch

    Daniel L. Schmoldt; Jing He; A. Lynn Abbott

    1998-01-01

    Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...

  17. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  18. Replication and Transcription of Eukaryotic DNA in Esherichia coli

    PubMed Central

    Morrow, John F.; Cohen, Stanley N.; Chang, Annie C. Y.; Boyer, Herbert W.; Goodman, Howard M.; Helling, Robert B.

    1974-01-01

    Fragments of amplified Xenopus laevis DNA, coding for 18S and 28S ribosomal RNA and generated by EcoRI restriction endonuclease, have been linked in vitro to the bacterial plasmid pSC101; and the recombinant molecular species have been introduced into E. coli by transformation. These recombinant plasmids, containing both eukaryotic and prokaryotic DNA, replicate stably in E. coli. RNA isolated from E. coli minicells harboring the plasmids hybridizes to amplified X. laevis rDNA. Images PMID:4600264

  19. Representation mutations from standard genetic codes

    NASA Astrophysics Data System (ADS)

    Aisah, I.; Suyudi, M.; Carnia, E.; Suhendi; Supriatna, A. K.

    2018-03-01

    Graph is widely used in everyday life especially to describe model problem and describe it concretely and clearly. In addition graph is also used to facilitate solve various kinds of problems that are difficult to be solved by calculation. In Biology, graph can be used to describe the process of protein synthesis in DNA. Protein has an important role for DNA (deoxyribonucleic acid) or RNA (ribonucleic acid). Proteins are composed of amino acids. In this study, amino acids are related to genetics, especially the genetic code. The genetic code is also known as the triplet or codon code which is a three-letter arrangement of DNA nitrogen base. The bases are adenine (A), thymine (T), guanine (G) and cytosine (C). While on RNA thymine (T) is replaced with Urasil (U). The set of all Nitrogen bases in RNA is denoted by N = {C U, A, G}. This codon works at the time of protein synthesis inside the cell. This codon also encodes the stop signal as a sign of the stop of protein synthesis process. This paper will examine the process of protein synthesis through mathematical studies and present it in three-dimensional space or graph. The study begins by analysing the set of all codons denoted by NNN such that to obtain geometric representations. At this stage there is a matching between the sets of all nitrogen bases N with Z 2 × Z 2; C=(\\overline{0},\\overline{0}),{{U}}=(\\overline{0},\\overline{1}),{{A}}=(\\overline{1},\\overline{0}),{{G}}=(\\overline{1},\\overline{1}). By matching the algebraic structure will be obtained such as group, group Klein-4,Quotien group etc. With the help of Geogebra software, the set of all codons denoted by NNN can be presented in a three-dimensional space as a multicube NNN and also can be represented as a graph, so that can easily see relationship between the codon.

  20. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Information Security Program § 503.59 Safeguarding classified information. (a) All classified information... security; (2) Takes appropriate steps to protect classified information from unauthorized disclosure or... security check; (2) To protect the classified information in accordance with the provisions of Executive...

  1. Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

    PubMed

    Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

    2007-06-01

    The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.

  2. Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims☆

    PubMed Central

    Bertke, S. J.; Meyers, A. R.; Wurzelbacher, S. J.; Bell, J.; Lampl, M. L.; Robins, D.

    2015-01-01

    Introduction Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, identifying the cause of injuries and illnesses in large datasets such as workers’ compensation systems often requires reading and coding the free form accident text narrative for potentially millions of records. Method To alleviate the need for manual coding, this paper describes and evaluates a computer auto-coding algorithm that demonstrated the ability to code millions of claims quickly and accurately by learning from a set of previously manually coded claims. Conclusions The auto-coding program was able to code claims as a musculoskeletal disorders, STF or other with approximately 90% accuracy. Impact on industry The program developed and discussed in this paper provides an accurate and efficient method for identifying the causation of workers’ compensation claims as a STF or MSD in a large database based on the unstructured text narrative and resulting injury diagnoses. The program coded thousands of claims in minutes. The method described in this paper can be used by researchers and practitioners to relieve the manual burden of reading and identifying the causation of claims as a STF or MSD. Furthermore, the method can be easily generalized to code/classify other unstructured text narratives. PMID:23206504

  3. LCC: Light Curves Classifier

    NASA Astrophysics Data System (ADS)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  4. 15 CFR 4.8 - Classified Information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...

  5. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 4 2010-07-01 2010-07-01 true Classified actions. 651.13 Section 651.13... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process § 651.13 Classified actions. (a) For proposed actions and NEPA analyses involving classified information...

  6. Lectin cDNA and transgenic plants derived therefrom

    DOEpatents

    Raikhel, N.V.

    1994-01-04

    Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties. GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon. .

  7. Lectin cDNA and transgenic plants derived therefrom

    DOEpatents

    Raikhel, Natasha V.

    1994-01-04

    Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties. GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.

  8. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  9. HPV-relatedness definitions for classifying HPV-related oropharyngeal cancer patient do impact on TNM classification and patients' survival.

    PubMed

    Taberna, Miren; Mena, Marisa; Tous, Sara; Pavón, Miquel Angel; Oliva, Marc; León, Xavier; Garcia, Jacinto; Guix, Marta; Hijano, Rafael; Bonfill, Teresa; Aguilà, Antón; Alemany, Laia; Mesía, Ricard

    2018-01-01

    Given the different nature and better outcomes of oropharyngeal carcinoma (OPC) associated with human papillomavirus (HPV) infection, a novel clinical stage classification for HPV-related OPC has been accepted for the 8th edition AJCC TNM (ICON-S model). However, it is still unclear the HPV-relatedness definition with best diagnostic accuracy and prognostic value. The aim of this study was to compare different staging system models proposed for HPV-related OPC patients: 7th edition AJCC TNM, RPA stage with non-anatomic factors (Princess Margaret), RPA with N categories for nasopharyngeal cancer (MD-Anderson) and AHR-new (ICON-S), according to different HPV-relatedness definitions: HPV-DNA detection plus an additional positive marker (p16INK4a or HPV-mRNA), p16INK4a positivity alone or the combination of HPV-DNA/p16INK4a positivity as diagnostic tests. A total of 788 consecutive OPC cases diagnosed from 1991 to 2013 were considered eligible for the analysis. Of these samples, 66 (8.4%) were positive for HPV-DNA and (p16INK4a or HPV-mRNA), 83 (10.5%) were p16INK4a positive and 58 (7.4%) were double positive for HPV-DNA/p16INK4a. ICON-S model was the staging system, which performed better in our series when using at least two biomarkers to define HPV-causality. When the same analysis was performed considering only p16INK4a-positivity, RPA stage with non-anatomic factors (Princess Margaret) has the best classification based on AIC criteria. HPV-relatedness definition for classifying HPV-related OPC patient do impact on TNM classification and patients' survival. Further studies assessing HPV-relatedness definitions are warranted to better classify HPV-related OPC patients in the era of de-escalation clinical trials.

  10. HPV-relatedness definitions for classifying HPV-related oropharyngeal cancer patient do impact on TNM classification and patients’ survival

    PubMed Central

    Mena, Marisa; Tous, Sara; Pavón, Miquel Angel; Oliva, Marc; León, Xavier; Garcia, Jacinto; Guix, Marta; Hijano, Rafael; Bonfill, Teresa; Aguilà, Antón; Alemany, Laia; Mesía, Ricard

    2018-01-01

    Background Given the different nature and better outcomes of oropharyngeal carcinoma (OPC) associated with human papillomavirus (HPV) infection, a novel clinical stage classification for HPV-related OPC has been accepted for the 8th edition AJCC TNM (ICON-S model). However, it is still unclear the HPV-relatedness definition with best diagnostic accuracy and prognostic value. Material and methods The aim of this study was to compare different staging system models proposed for HPV-related OPC patients: 7th edition AJCC TNM, RPA stage with non-anatomic factors (Princess Margaret), RPA with N categories for nasopharyngeal cancer (MD-Anderson) and AHR-new (ICON-S), according to different HPV-relatedness definitions: HPV-DNA detection plus an additional positive marker (p16INK4a or HPV-mRNA), p16INK4a positivity alone or the combination of HPV-DNA/p16INK4a positivity as diagnostic tests. Results A total of 788 consecutive OPC cases diagnosed from 1991 to 2013 were considered eligible for the analysis. Of these samples, 66 (8.4%) were positive for HPV-DNA and (p16INK4a or HPV-mRNA), 83 (10.5%) were p16INK4a positive and 58 (7.4%) were double positive for HPV-DNA/p16INK4a. ICON-S model was the staging system, which performed better in our series when using at least two biomarkers to define HPV-causality. When the same analysis was performed considering only p16INK4a-positivity, RPA stage with non-anatomic factors (Princess Margaret) has the best classification based on AIC criteria. Conclusion HPV-relatedness definition for classifying HPV-related OPC patient do impact on TNM classification and patients’ survival. Further studies assessing HPV-relatedness definitions are warranted to better classify HPV-related OPC patients in the era of de-escalation clinical trials. PMID:29664911

  11. Classifying Microorganisms.

    ERIC Educational Resources Information Center

    Baker, William P.; Leyva, Kathryn J.; Lang, Michael; Goodmanis, Ben

    2002-01-01

    Focuses on an activity in which students sample air at school and generate ideas about how to classify the microorganisms they observe. The results are used to compare air quality among schools via the Internet. Supports the development of scientific inquiry and technology skills. (DDR)

  12. Getting it Right: How DNA Polymerases Select the Right Nucleotide.

    PubMed

    Ludmann, Samra; Marx, Andreas

    2016-01-01

    All living organisms are defined by their genetic code encrypted in their DNA. DNA polymerases are the enzymes that are responsible for all DNA syntheses occurring in nature. For DNA replication, repair and recombination these enzymes have to read the parental DNA and recognize the complementary nucleotide out of a pool of four structurally similar deoxynucleotide triphosphates (dNTPs) for a given template. The selection of the nucleotide is in accordance with the Watson-Crick rule. In this process the accuracy of DNA synthesis is crucial for the maintenance of the genome stability. However, to spur evolution a certain degree of freedom must be allowed. This brief review highlights the mechanistic basis for selecting the right nucleotide by DNA polymerases.

  13. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and

  14. Multicategory Composite Least Squares Classifiers

    PubMed Central

    Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul

    2010-01-01

    Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128

  15. A laboratory information management system for DNA barcoding workflows.

    PubMed

    Vu, Thuy Duong; Eberhardt, Ursula; Szöke, Szániszló; Groenewald, Marizeth; Robert, Vincent

    2012-07-01

    This paper presents a laboratory information management system for DNA sequences (LIMS) created and based on the needs of a DNA barcoding project at the CBS-KNAW Fungal Biodiversity Centre (Utrecht, the Netherlands). DNA barcoding is a global initiative for species identification through simple DNA sequence markers. We aim at generating barcode data for all strains (or specimens) included in the collection (currently ca. 80 k). The LIMS has been developed to better manage large amounts of sequence data and to keep track of the whole experimental procedure. The system has allowed us to classify strains more efficiently as the quality of sequence data has improved, and as a result, up-to-date taxonomic names have been given to strains and more accurate correlation analyses have been carried out.

  16. A Binary-Encounter-Bethe Approach to Simulate DNA Damage by the Direct Effect

    NASA Technical Reports Server (NTRS)

    Plante, Ianik; Cucinotta, Francis A.

    2013-01-01

    The DNA damage is of crucial importance in the understanding of the effects of ionizing radiation. The main mechanisms of DNA damage are by the direct effect of radiation (e.g. direct ionization) and by indirect effect (e.g. damage by.OH radicals created by the radiolysis of water). Despite years of research in this area, many questions on the formation of DNA damage remains. To refine existing DNA damage models, an approach based on the Binary-Encounter-Bethe (BEB) model was developed[1]. This model calculates differential cross sections for ionization of the molecular orbitals of the DNA bases, sugars and phosphates using the electron binding energy, the mean kinetic energy and the occupancy number of the orbital. This cross section has an analytic form which is quite convenient to use and allows the sampling of the energy loss occurring during an ionization event. To simulate the radiation track structure, the code RITRACKS developed at the NASA Johnson Space Center is used[2]. This code calculates all the energy deposition events and the formation of the radiolytic species by the ion and the secondary electrons as well. We have also developed a technique to use the integrated BEB cross section for the bases, sugar and phosphates in the radiation transport code RITRACKS. These techniques should allow the simulation of DNA damage by ionizing radiation, and understanding of the formation of double-strand breaks caused by clustered damage in different conditions.

  17. Screening for Protein-DNA Interactions by Automatable DNA-Protein Interaction ELISA

    PubMed Central

    Schüssler, Axel; Kolukisaoglu, H. Üner; Koch, Grit; Wallmeroth, Niklas; Hecker, Andreas; Thurow, Kerstin; Zell, Andreas; Harter, Klaus; Wanke, Dierk

    2013-01-01

    DNA-binding proteins (DBPs), such as transcription factors, constitute about 10% of the protein-coding genes in eukaryotic genomes and play pivotal roles in the regulation of chromatin structure and gene expression by binding to short stretches of DNA. Despite their number and importance, only for a minor portion of DBPs the binding sequence had been disclosed. Methods that allow the de novo identification of DNA-binding motifs of known DBPs, such as protein binding microarray technology or SELEX, are not yet suited for high-throughput and automation. To close this gap, we report an automatable DNA-protein-interaction (DPI)-ELISA screen of an optimized double-stranded DNA (dsDNA) probe library that allows the high-throughput identification of hexanucleotide DNA-binding motifs. In contrast to other methods, this DPI-ELISA screen can be performed manually or with standard laboratory automation. Furthermore, output evaluation does not require extensive computational analysis to derive a binding consensus. We could show that the DPI-ELISA screen disclosed the full spectrum of binding preferences for a given DBP. As an example, AtWRKY11 was used to demonstrate that the automated DPI-ELISA screen revealed the entire range of in vitro binding preferences. In addition, protein extracts of AtbZIP63 and the DNA-binding domain of AtWRKY33 were analyzed, which led to a refinement of their known DNA-binding consensi. Finally, we performed a DPI-ELISA screen to disclose the DNA-binding consensus of a yet uncharacterized putative DBP, AtTIFY1. A palindromic TGATCA-consensus was uncovered and we could show that the GATC-core is compulsory for AtTIFY1 binding. This specific interaction between AtTIFY1 and its DNA-binding motif was confirmed by in vivo plant one-hybrid assays in protoplasts. Thus, the value and applicability of the DPI-ELISA screen for de novo binding site identification of DBPs, also under automatized conditions, is a promising approach for a deeper understanding

  18. Analytical performance of a bronchial genomic classifier.

    PubMed

    Hu, Zhanzhi; Whitney, Duncan; Anderson, Jessica R; Cao, Manqiu; Ho, Christine; Choi, Yoonha; Huang, Jing; Frink, Robert; Smith, Kate Porta; Monroe, Robert; Kennedy, Giulia C; Walsh, P Sean

    2016-02-26

    The current standard practice of lung lesion diagnosis often leads to inconclusive results, requiring additional diagnostic follow up procedures that are invasive and often unnecessary due to the high benign rate in such lesions (Chest 143:e78S-e92, 2013). The Percepta bronchial genomic classifier was developed and clinically validated to provide more accurate classification of lung nodules and lesions that are inconclusive by bronchoscopy, using bronchial brushing specimens (N Engl J Med 373:243-51, 2015, BMC Med Genomics 8:18, 2015). The analytical performance of the Percepta test is reported here. Analytical performance studies were designed to characterize the stability of RNA in bronchial brushing specimens during collection and shipment; analytical sensitivity defined as input RNA mass; analytical specificity (i.e. potentially interfering substances) as tested on blood and genomic DNA; and assay performance studies including intra-run, inter-run, and inter-laboratory reproducibility. RNA content within bronchial brushing specimens preserved in RNAprotect is stable for up to 20 days at 4 °C with no changes in RNA yield or integrity. Analytical sensitivity studies demonstrated tolerance to variation in RNA input (157 ng to 243 ng). Analytical specificity studies utilizing cancer positive and cancer negative samples mixed with either blood (up to 10 % input mass) or genomic DNA (up to 10 % input mass) demonstrated no assay interference. The test is reproducible from RNA extraction through to Percepta test result, including variation across operators, runs, reagent lots, and laboratories (standard deviation of 0.26 for scores on > 6 unit scale). Analytical sensitivity, analytical specificity and robustness of the Percepta test were successfully verified, supporting its suitability for clinical use.

  19. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Parents' Experiences and Perceptions when Classifying their Children with Cerebral Palsy: Recommendations for Service Providers.

    PubMed

    Scime, Natalie V; Bartlett, Doreen J; Brunton, Laura K; Palisano, Robert J

    2017-08-01

    This study investigated the experiences and perceptions of parents of children with cerebral palsy (CP) when classifying their children using the Gross Motor Function Classification System (GMFCS), the Manual Ability Classification System (MACS), and the Communication Function Classification System (CFCS). The second aim was to collate parents' recommendations for service providers on how to interact and communicate with families. A purposive sample of seven parents participating in the On Track study was recruited. Semi-structured interviews were conducted orally and were audiotaped, transcribed, and coded openly. A descriptive interpretive approach within a pragmatic perspective was used during analysis. Seven themes encompassing parents' experiences and perspectives reflect a process of increased understanding when classifying their children, with perceptions of utility evident throughout this process. Six recommendations for service providers emerged, including making the child a priority and being a dependable resource. Knowledge of parents' experiences when using the GMFCS, MACS, and CFCS can provide useful insight for service providers collaborating with parents to classify function in children with CP. Using the recommendations from these parents can facilitate family-provider collaboration for goal setting and intervention planning.

  1. Effect of bar-code technology on the safety of medication administration.

    PubMed

    Poon, Eric G; Keohane, Carol A; Yoon, Catherine S; Ditmore, Matthew; Bane, Anne; Levtzion-Korach, Osnat; Moniz, Thomas; Rothschild, Jeffrey M; Kachalia, Allen B; Hayes, Judy; Churchill, William W; Lipsitz, Stuart; Whittemore, Anthony D; Bates, David W; Gandhi, Tejal K

    2010-05-06

    Serious medication errors are common in hospitals and often occur during order transcription or administration of medication. To help prevent such errors, technology has been developed to verify medications by incorporating bar-code verification technology within an electronic medication-administration system (bar-code eMAR). We conducted a before-and-after, quasi-experimental study in an academic medical center that was implementing the bar-code eMAR. We assessed rates of errors in order transcription and medication administration on units before and after implementation of the bar-code eMAR. Errors that involved early or late administration of medications were classified as timing errors and all others as nontiming errors. Two clinicians reviewed the errors to determine their potential to harm patients and classified those that could be harmful as potential adverse drug events. We observed 14,041 medication administrations and reviewed 3082 order transcriptions. Observers noted 776 nontiming errors in medication administration on units that did not use the bar-code eMAR (an 11.5% error rate) versus 495 such errors on units that did use it (a 6.8% error rate)--a 41.4% relative reduction in errors (P<0.001). The rate of potential adverse drug events (other than those associated with timing errors) fell from 3.1% without the use of the bar-code eMAR to 1.6% with its use, representing a 50.8% relative reduction (P<0.001). The rate of timing errors in medication administration fell by 27.3% (P<0.001), but the rate of potential adverse drug events associated with timing errors did not change significantly. Transcription errors occurred at a rate of 6.1% on units that did not use the bar-code eMAR but were completely eliminated on units that did use it. Use of the bar-code eMAR substantially reduced the rate of errors in order transcription and in medication administration as well as potential adverse drug events, although it did not eliminate such errors. Our data show

  2. 32 CFR 775.5 - Classified actions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Air Act (42 U.S.C. 7609 et seq.). (b) It should be noted that a classified EA/EIS serves the same “informed decisionmaking” purpose as does a published unclassified EA/EIS. Even though the classified EA/EIS... be considered by the decisionmaker for the proposed action. The content of a classified EA/EIS (or...

  3. Geomagnetic Storm Impact On GPS Code Positioning

    NASA Astrophysics Data System (ADS)

    Uray, Fırat; Varlık, Abdullah; Kalaycı, İbrahim; Öǧütcü, Sermet

    2017-04-01

    This paper deals with the geomagnetic storm impact on GPS code processing with using GIPSY/OASIS research software. 12 IGS stations in mid-latitude were chosen to conduct the experiment. These IGS stations were classified as non-cross correlation receiver reporting P1 and P2 (NONCC-P1P2), non-cross correlation receiver reporting C1 and P2 (NONCC-C1P2) and cross-correlation (CC-C1P2) receiver. In order to keep the code processing consistency between the classified receivers, only P2 code observations from the GPS satellites were processed. Four extreme geomagnetic storms October 2003, day of the year (DOY), 29, 30 Halloween Storm, November 2003, DOY 20, November 2004, DOY 08 and four geomagnetic quiet days in 2005 (DOY 92, 98, 99, 100) were chosen for this study. 24-hour rinex data of the IGS stations were processed epoch-by-epoch basis. In this way, receiver clock and Earth Centered Earth Fixed (ECEF) Cartesian Coordinates were solved for a per-epoch basis for each day. IGS combined broadcast ephemeris file (brdc) were used to partly compensate the ionospheric effect on the P2 code observations. There is no tropospheric model was used for the processing. Jet Propulsion Laboratory Application Technology Satellites (JPL ATS) computed coordinates of the stations were taken as true coordinates. The differences of the computed ECEF coordinates and assumed true coordinates were resolved to topocentric coordinates (north, east, up). Root mean square (RMS) errors for each component were calculated for each day. The results show that two-dimensional and vertical accuracy decreases significantly during the geomagnetic storm days comparing with the geomagnetic quiet days. It is observed that vertical accuracy is much more affected than the horizontal accuracy by geomagnetic storm. Up to 50 meters error in vertical component has been observed in geomagnetic storm day. It is also observed that performance of Klobuchar ionospheric correction parameters during geomagnetic storm

  4. A fuzzy classifier system for process control

    NASA Technical Reports Server (NTRS)

    Karr, C. L.; Phillips, J. C.

    1994-01-01

    A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.

  5. Identification of column edges of DNA fragments by using K-means clustering and mean algorithm on lane histograms of DNA agarose gel electrophoresis images

    NASA Astrophysics Data System (ADS)

    Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.

    2015-07-01

    Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.

  6. Orthopedics coding and funding.

    PubMed

    Baron, S; Duclos, C; Thoreux, P

    2014-02-01

    The French tarification à l'activité (T2A) prospective payment system is a financial system in which a health-care institution's resources are based on performed activity. Activity is described via the PMSI medical information system (programme de médicalisation du système d'information). The PMSI classifies hospital cases by clinical and economic categories known as diagnosis-related groups (DRG), each with an associated price tag. Coding a hospital case involves giving as realistic a description as possible so as to categorize it in the right DRG and thus ensure appropriate payment. For this, it is essential to understand what determines the pricing of inpatient stay: namely, the code for the surgical procedure, the patient's principal diagnosis (reason for admission), codes for comorbidities (everything that adds to management burden), and the management of the length of inpatient stay. The PMSI is used to analyze the institution's activity and dynamism: change on previous year, relation to target, and comparison with competing institutions based on indicators such as the mean length of stay performance indicator (MLS PI). The T2A system improves overall care efficiency. Quality of care, however, is not presently taken account of in the payment made to the institution, as there are no indicators for this; work needs to be done on this topic. Copyright © 2014. Published by Elsevier Masson SAS.

  7. The EB factory project. I. A fast, neural-net-based, general purpose light curve classifier optimized for eclipsing binaries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.

    2014-08-01

    We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together withmore » a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (∼60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.« less

  8. A comprehensive statistical classifier of foci in the cell transformation assay for carcinogenicity testing.

    PubMed

    Callegaro, Giulia; Malkoc, Kasja; Corvi, Raffaella; Urani, Chiara; Stefanini, Federico M

    2017-12-01

    The identification of the carcinogenic risk of chemicals is currently mainly based on animal studies. The in vitro Cell Transformation Assays (CTAs) are a promising alternative to be considered in an integrated approach. CTAs measure the induction of foci of transformed cells. CTAs model key stages of the in vivo neoplastic process and are able to detect both genotoxic and some non-genotoxic compounds, being the only in vitro method able to deal with the latter. Despite their favorable features, CTAs can be further improved, especially reducing the possible subjectivity arising from the last phase of the protocol, namely visual scoring of foci using coded morphological features. By taking advantage of digital image analysis, the aim of our work is to translate morphological features into statistical descriptors of foci images, and to use them to mimic the classification performances of the visual scorer to discriminate between transformed and non-transformed foci. Here we present a classifier based on five descriptors trained on a dataset of 1364 foci, obtained with different compounds and concentrations. Our classifier showed accuracy, sensitivity and specificity equal to 0.77 and an area under the curve (AUC) of 0.84. The presented classifier outperforms a previously published model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. The C terminus of Ku80 activates the DNA-dependent protein kinase catalytic subunit.

    PubMed

    Singleton, B K; Torres-Arzayus, M I; Rottinghaus, S T; Taccioli, G E; Jeggo, P A

    1999-05-01

    Ku is a heterodimeric protein with double-stranded DNA end-binding activity that operates in the process of nonhomologous end joining. Ku is thought to target the DNA-dependent protein kinase (DNA-PK) complex to the DNA and, when DNA bound, can interact and activate the DNA-PK catalytic subunit (DNA-PKcs). We have carried out a 3' deletion analysis of Ku80, the larger subunit of Ku, and shown that the C-terminal 178 amino acid residues are dispensable for DNA end-binding activity but are required for efficient interaction of Ku with DNA-PKcs. Cells expressing Ku80 proteins that lack the terminal 178 residues have low DNA-PK activity, are radiation sensitive, and can recombine the signal junctions but not the coding junctions during V(D)J recombination. These cells have therefore acquired the phenotype of mouse SCID cells despite expressing DNA-PKcs protein, suggesting that an interaction between DNA-PKcs and Ku, involving the C-terminal region of Ku80, is required for DNA double-strand break rejoining and coding but not signal joint formation. To gain further insight into important domains in Ku80, we report a point mutational change in Ku80 in the defective xrs-2 cell line. This residue is conserved among species and lies outside of the previously reported Ku70-Ku80 interaction domain. The mutational change nonetheless abrogates the Ku70-Ku80 interaction and DNA end-binding activity.

  10. Bijective transformation circular codes and nucleotide exchanging RNA transcription.

    PubMed

    Michel, Christian J; Seligmann, Hervé

    2014-04-01

    The C(3) self-complementary circular code X identified in genes of prokaryotes and eukaryotes is a set of 20 trinucleotides enabling reading frame retrieval and maintenance, i.e. a framing code (Arquès and Michel, 1996; Michel, 2012, 2013). Some mitochondrial RNAs correspond to DNA sequences when RNA transcription systematically exchanges between nucleotides (Seligmann, 2013a,b). We study here the 23 bijective transformation codes ΠX of X which may code nucleotide exchanging RNA transcription as suggested by this mitochondrial observation. The 23 bijective transformation codes ΠX are C(3) trinucleotide circular codes, seven of them are also self-complementary. Furthermore, several correlations are observed between the Reading Frame Retrieval (RFR) probability of bijective transformation codes ΠX and the different biological properties of ΠX related to their numbers of RNAs in GenBank's EST database, their polymerization rate, their number of amino acids and the chirality of amino acids they code. Results suggest that the circular code X with the functions of reading frame retrieval and maintenance in regular RNA transcription, may also have, through its bijective transformation codes ΠX, the same functions in nucleotide exchanging RNA transcription. Associations with properties such as amino acid chirality suggest that the RFR of X and its bijective transformations molded the origins of the genetic code's machinery. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  11. Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae.

    PubMed

    Sharma, Mukul; Vedithi, Sundeep Chaitanya; Das, Madhusmita; Roy, Anindya; Ebenezer, Mannam

    2017-01-01

    Survival of Mycobacterium leprae, the causative bacteria for leprosy, in the human host is dependent to an extent on the ways in which its genome integrity is retained. DNA repair mechanisms protect bacterial DNA from damage induced by various stress factors. The current study is aimed at understanding the sequence and functional annotation of DNA repair genes in M. leprae. T he genome of M. leprae was annotated using sequence alignment tools to identify DNA repair genes that have homologs in Mycobacterium tuberculosis and Escherichia coli. A set of 96 genes known to be involved in DNA repair mechanisms in E. coli and Mycobacteriaceae were chosen as a reference. Among these, 61 were identified in M. leprae based on sequence similarity and domain architecture. The 61 were classified into 36 characterized gene products (59%), 11 hypothetical proteins (18%), and 14 pseudogenes (23%). All these genes have homologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli were present in M. leprae and in Mycobacteriaceae. These 61 genes were further investigated for their expression profiles in the whole transcriptome microarray data of M. leprae which was obtained from the signal intensities of 60bp probes, tiling the entire genome with 10bp overlaps. It was noted that transcripts corresponding to all the 61 genes were identified in the transcriptome data with varying expression levels ranging from 0.18 to 2.47 fold (normalized with 16SrRNA). The mRNA expression levels of a representative set of seven genes ( four annotated and three hypothetical protein coding genes) were analyzed using quantitative Polymerase Chain Reaction (qPCR) assays with RNA extracted from skin biopsies of 10 newly diagnosed, untreated leprosy cases. It was noted that RNA expression levels were higher for genes involved in homologous recombination whereas the genes with a low level of expression are involved in the direct repair pathway. This study provided

  12. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional

  13. De Novo Origin of Human Protein-Coding Genes

    PubMed Central

    Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping

    2011-01-01

    The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831

  14. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.

    PubMed

    Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M

    2007-01-01

    DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.

  15. Feature Selection and Effective Classifiers.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  16. A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases

    PubMed Central

    Iyer, Lakshminarayan M; Abhiman, Saraswathi; Aravind, L

    2008-01-01

    Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases. This article was reviewed by Eugene Koonin and Mark Ragan. PMID:18834537

  17. Recognition Using Hybrid Classifiers.

    PubMed

    Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

    2016-04-01

    A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.

  18. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier.

    PubMed

    Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R

    2016-11-05

    DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  20. Microparticles: Facile and High-Throughput Synthesis of Functional Microparticles with Quick Response Codes (Small 24/2016).

    PubMed

    Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun

    2016-06-01

    Microparticles carrying quick response (QR) barcodes are fabricated by J. Wang and co-workers on page 3259, using a massive coding of dissociated elements (MiCODE) technology. Each microparticle can bear a special custom-designed QR code that enables encryption or tagging with unlimited multiplexity, and the QR code can be easily read by cellphone applications. The utility of MiCODE particles in multiplexed DNA detection and microtagging for anti-counterfeiting is explored. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Ensemble of classifiers for ontology enrichment

    NASA Astrophysics Data System (ADS)

    Semenova, A. V.; Kureichik, V. M.

    2018-05-01

    A classifier is a basis of ontology learning systems. Classification of text documents is used in many applications, such as information retrieval, information extraction, definition of spam. A new ensemble of classifiers based on SVM (a method of support vectors), LSTM (neural network) and word embedding are suggested. An experiment was conducted on open data, which allows us to conclude that the proposed classification method is promising. The implementation of the proposed classifier is performed in the Matlab using the functions of the Text Analytics Toolbox. The principal difference between the proposed ensembles of classifiers is the high quality of classification of data at acceptable time costs.

  2. Neural network-based brain tissue segmentation in MR images using extracted features from intraframe coding in H.264

    NASA Astrophysics Data System (ADS)

    Jafari, Mehdi; Kasaei, Shohreh

    2012-01-01

    Automatic brain tissue segmentation is a crucial task in diagnosis and treatment of medical images. This paper presents a new algorithm to segment different brain tissues, such as white matter (WM), gray matter (GM), cerebral spinal fluid (CSF), background (BKG), and tumor tissues. The proposed technique uses the modified intraframe coding yielded from H.264/(AVC), for feature extraction. Extracted features are then imposed to an artificial back propagation neural network (BPN) classifier to assign each block to its appropriate class. Since the newest coding standard, H.264/AVC, has the highest compression ratio, it decreases the dimension of extracted features and thus yields to a more accurate classifier with low computational complexity. The performance of the BPN classifier is evaluated using the classification accuracy and computational complexity terms. The results show that the proposed technique is more robust and effective with low computational complexity compared to other recent works.

  3. Neural network-based brain tissue segmentation in MR images using extracted features from intraframe coding in H.264

    NASA Astrophysics Data System (ADS)

    Jafari, Mehdi; Kasaei, Shohreh

    2011-12-01

    Automatic brain tissue segmentation is a crucial task in diagnosis and treatment of medical images. This paper presents a new algorithm to segment different brain tissues, such as white matter (WM), gray matter (GM), cerebral spinal fluid (CSF), background (BKG), and tumor tissues. The proposed technique uses the modified intraframe coding yielded from H.264/(AVC), for feature extraction. Extracted features are then imposed to an artificial back propagation neural network (BPN) classifier to assign each block to its appropriate class. Since the newest coding standard, H.264/AVC, has the highest compression ratio, it decreases the dimension of extracted features and thus yields to a more accurate classifier with low computational complexity. The performance of the BPN classifier is evaluated using the classification accuracy and computational complexity terms. The results show that the proposed technique is more robust and effective with low computational complexity compared to other recent works.

  4. Re-visiting protein-centric two-tier classification of existing DNA-protein complexes.

    PubMed

    Malhotra, Sony; Sowdhamini, Ramanathan

    2012-07-16

    Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA

  5. Chromatin replication: TRANSmitting the histone code

    PubMed Central

    Chang, Han-Wen; Studitsky, Vasily M.

    2017-01-01

    Efficient overcoming of the nucleosomal barrier and accurate maintenance of associated histone marks during chromatin replication are essential for normal functioning of the cell. Recent studies revealed new protein factors and histone modifications contributing to overcoming the nucleosomal barrier, and suggested an important role for DNA looping in survival of the original histones during replication. These studies suggest new possible mechanisms for transmitting the histone code to next generations of cells. PMID:28393112

  6. A family of long intergenic non-coding RNA genes in human chromosomal region 22q11.2 carry a DNA translocation breakpoint/AT-rich sequence

    PubMed Central

    2018-01-01

    FAM230C, a long intergenic non-coding RNA (lincRNA) gene in human chromosome 13 (chr13) is a member of lincRNA genes termed family with sequence similarity 230. An analysis using bioinformatics search tools and alignment programs was undertaken to determine properties of FAM230C and its related genes. Results reveal that the DNA translocation element, the Translocation Breakpoint Type A (TBTA) sequence, which consists of satellite DNA, Alu elements, and AT-rich sequences is embedded in the FAM230C gene. Eight lincRNA genes related to FAM230C also carry the TBTA sequences. These genes were formed from a large segment of the 3’ half of the FAM230C sequence duplicated in chr22, and are specifically in regions of low copy repeats (LCR22)s, in or close to the 22q.11.2 region. 22q11.2 is a chromosomal segment that undergoes a high rate of DNA translocation and is prone to genetic deletions. FAM230C-related genes present in other chromosomes do not carry the TBTA motif and were formed from the 5’ half region of the FAM230C sequence. These findings identify a high specificity in lincRNA gene formation by gene sequence duplication in different chromosomes. PMID:29668722

  7. Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.

    PubMed

    Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. DRG coding practice: a nationwide hospital survey in Thailand

    PubMed Central

    2011-01-01

    Background Diagnosis Related Group (DRG) payment is preferred by healthcare reform in various countries but its implementation in resource-limited countries has not been fully explored. Objectives This study was aimed (1) to compare the characteristics of hospitals in Thailand that were audited with those that were not and (2) to develop a simplified scale to measure hospital coding practice. Methods A questionnaire survey was conducted of 920 hospitals in the Summary and Coding Audit Database (SCAD hospitals, all of which were audited in 2008 because of suspicious reports of possible DRG miscoding); the questionnaire also included 390 non-SCAD hospitals. The questionnaire asked about general demographics of the hospitals, hospital coding structure and process, and also included a set of 63 opinion-oriented items on the current hospital coding practice. Descriptive statistics and exploratory factor analysis (EFA) were used for data analysis. Results SCAD and Non-SCAD hospitals were different in many aspects, especially the number of medical statisticians, experience of medical statisticians and physicians, as well as number of certified coders. Factor analysis revealed a simplified 3-factor, 20-item model to assess hospital coding practice and classify hospital intention. Conclusion Hospital providers should not be assumed capable of producing high quality DRG codes, especially in resource-limited settings. PMID:22040256

  9. DRG coding practice: a nationwide hospital survey in Thailand.

    PubMed

    Pongpirul, Krit; Walker, Damian G; Rahman, Hafizur; Robinson, Courtland

    2011-10-31

    Diagnosis Related Group (DRG) payment is preferred by healthcare reform in various countries but its implementation in resource-limited countries has not been fully explored. This study was aimed (1) to compare the characteristics of hospitals in Thailand that were audited with those that were not and (2) to develop a simplified scale to measure hospital coding practice. A questionnaire survey was conducted of 920 hospitals in the Summary and Coding Audit Database (SCAD hospitals, all of which were audited in 2008 because of suspicious reports of possible DRG miscoding); the questionnaire also included 390 non-SCAD hospitals. The questionnaire asked about general demographics of the hospitals, hospital coding structure and process, and also included a set of 63 opinion-oriented items on the current hospital coding practice. Descriptive statistics and exploratory factor analysis (EFA) were used for data analysis. SCAD and Non-SCAD hospitals were different in many aspects, especially the number of medical statisticians, experience of medical statisticians and physicians, as well as number of certified coders. Factor analysis revealed a simplified 3-factor, 20-item model to assess hospital coding practice and classify hospital intention. Hospital providers should not be assumed capable of producing high quality DRG codes, especially in resource-limited settings.

  10. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    PubMed Central

    Arshad, Sannia; Rho, Seungmin

    2014-01-01

    We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302

  11. Robust framework to combine diverse classifiers assigning distributed confidence to individual classifiers at class level.

    PubMed

    Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin

    2014-01-01

    We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.

  12. Sub-10 nm patterning with DNA nanostructures: a short perspective

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Du, Ke; Park, Myeongkee; Ding, Junjun

    DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ~10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either 'top-down' or 'bottom-up' approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This paper starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Finally, emphasis is put on the challenges and prospects of DNA nanolithography as the nextmore » generation nanomanufacturing technique.« less

  13. Sub-10 nm patterning with DNA nanostructures: a short perspective

    DOE PAGES

    Du, Ke; Park, Myeongkee; Ding, Junjun; ...

    2017-09-04

    DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ~10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either 'top-down' or 'bottom-up' approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This paper starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Finally, emphasis is put on the challenges and prospects of DNA nanolithography as the nextmore » generation nanomanufacturing technique.« less

  14. Sub-10 nm patterning with DNA nanostructures: a short perspective

    NASA Astrophysics Data System (ADS)

    Du, Ke; Park, Myeongkee; Ding, Junjun; Hu, Huan; Zhang, Zheng

    2017-11-01

    DNA is the hereditary material that contains our unique genetic code. Since the first demonstration of two-dimensional (2D) nanopatterns by using designed DNA origami ˜10 years ago, DNA has evolved into a novel technique for 2D and 3D nanopatterning. It is now being used as a template for the creation of sub-10 nm structures via either ‘top-down’ or ‘bottom-up’ approaches for various applications spanning from nanoelectronics, plasmonic sensing, and nanophotonics. This perspective starts with an histroric overview and discusses the current state-of-the-art in DNA nanolithography. Emphasis is put on the challenges and prospects of DNA nanolithography as the next generation nanomanufacturing technique.

  15. Validity of the coding for herpes simplex encephalitis in the Danish National Patient Registry

    PubMed Central

    Jørgensen, Laura Krogh; Dalgaard, Lars Skov; Østergaard, Lars Jørgen; Andersen, Nanna Skaarup; Nørgaard, Mette; Mogensen, Trine Hyrup

    2016-01-01

    Background Large health care databases are a valuable source of infectious disease epidemiology if diagnoses are valid. The aim of this study was to investigate the accuracy of the recorded diagnosis coding of herpes simplex encephalitis (HSE) in the Danish National Patient Registry (DNPR). Methods The DNPR was used to identify all hospitalized patients, aged ≥15 years, with a first-time diagnosis of HSE according to the International Classification of Diseases, tenth revision (ICD-10), from 2004 to 2014. To validate the coding of HSE, we collected data from the Danish Microbiology Database, from departments of clinical microbiology, and from patient medical records. Cases were classified as confirmed, probable, or no evidence of HSE. We estimated the positive predictive value (PPV) of the HSE diagnosis coding stratified by diagnosis type, study period, and department type. Furthermore, we estimated the proportion of HSE cases coded with nonspecific ICD-10 codes of viral encephalitis and also the sensitivity of the HSE diagnosis coding. Results We were able to validate 398 (94.3%) of the 422 HSE diagnoses identified via the DNPR. Hereof, 202 (50.8%) were classified as confirmed cases and 29 (7.3%) as probable cases providing an overall PPV of 58.0% (95% confidence interval [CI]: 53.0–62.9). For “Encephalitis due to herpes simplex virus” (ICD-10 code B00.4), the PPV was 56.6% (95% CI: 51.1–62.0). Similarly, the PPV for “Meningoencephalitis due to herpes simplex virus” (ICD-10 code B00.4A) was 56.8% (95% CI: 39.5–72.9). “Herpes viral encephalitis” (ICD-10 code G05.1E) had a PPV of 75.9% (95% CI: 56.5–89.7), thereby representing the highest PPV. The estimated sensitivity was 95.5%. Conclusion The PPVs of the ICD-10 diagnosis coding for adult HSE in the DNPR were relatively low. Hence, the DNPR should be used with caution when studying patients with encephalitis caused by herpes simplex virus. PMID:27330328

  16. Validity of the coding for herpes simplex encephalitis in the Danish National Patient Registry.

    PubMed

    Jørgensen, Laura Krogh; Dalgaard, Lars Skov; Østergaard, Lars Jørgen; Andersen, Nanna Skaarup; Nørgaard, Mette; Mogensen, Trine Hyrup

    2016-01-01

    Large health care databases are a valuable source of infectious disease epidemiology if diagnoses are valid. The aim of this study was to investigate the accuracy of the recorded diagnosis coding of herpes simplex encephalitis (HSE) in the Danish National Patient Registry (DNPR). The DNPR was used to identify all hospitalized patients, aged ≥15 years, with a first-time diagnosis of HSE according to the International Classification of Diseases, tenth revision (ICD-10), from 2004 to 2014. To validate the coding of HSE, we collected data from the Danish Microbiology Database, from departments of clinical microbiology, and from patient medical records. Cases were classified as confirmed, probable, or no evidence of HSE. We estimated the positive predictive value (PPV) of the HSE diagnosis coding stratified by diagnosis type, study period, and department type. Furthermore, we estimated the proportion of HSE cases coded with nonspecific ICD-10 codes of viral encephalitis and also the sensitivity of the HSE diagnosis coding. We were able to validate 398 (94.3%) of the 422 HSE diagnoses identified via the DNPR. Hereof, 202 (50.8%) were classified as confirmed cases and 29 (7.3%) as probable cases providing an overall PPV of 58.0% (95% confidence interval [CI]: 53.0-62.9). For "Encephalitis due to herpes simplex virus" (ICD-10 code B00.4), the PPV was 56.6% (95% CI: 51.1-62.0). Similarly, the PPV for "Meningoencephalitis due to herpes simplex virus" (ICD-10 code B00.4A) was 56.8% (95% CI: 39.5-72.9). "Herpes viral encephalitis" (ICD-10 code G05.1E) had a PPV of 75.9% (95% CI: 56.5-89.7), thereby representing the highest PPV. The estimated sensitivity was 95.5%. The PPVs of the ICD-10 diagnosis coding for adult HSE in the DNPR were relatively low. Hence, the DNPR should be used with caution when studying patients with encephalitis caused by herpes simplex virus.

  17. Non-coding RNA generated following lariat-debranching mediates targeting of AID to DNA

    PubMed Central

    Zheng, Simin; Vuong, Bao Q.; Vaidyanathan, Bharat; Lin, Jia-Yu; Huang, Feng-Ting; Chaudhuri, Jayanta

    2015-01-01

    SUMMARY Transcription through immunoglobulin switch (S) regions is essential for class switch recombination (CSR) but no molecular function of the transcripts has been described. Likewise, recruitment of activation-induced cytidine deaminase (AID) to S regions is critical for CSR; however, the underlying mechanism has not been fully elucidated. Here, we demonstrate that intronic switch RNA acts in trans to target AID to S region DNA. AID binds directly to switch RNA through G-quadruplexes formed by the RNA molecules. Disruption of this interaction by mutation of a key residue in the putative RNA-binding domain of AID impairs recruitment of AID to S region DNA, thereby abolishing CSR. Additionally, inhibition of RNA lariat processing leads to loss of AID localization to S regions and compromises CSR; both defects can be rescued by exogenous expression of switch transcripts in a sequence-specific manner. These studies uncover an RNA-mediated mechanism of targeting AID to DNA. PMID:25957684

  18. Probe DNA-Cisplatin Interaction with Solid-State Nanopores

    NASA Astrophysics Data System (ADS)

    Zhou, Zhi; Hu, Ying; Li, Wei; Xu, Zhi; Wang, Pengye; Bai, Xuedong; Shan, Xinyan; Lu, Xinghua; Nanopore Collaboration

    2014-03-01

    Understanding the mechanism of DNA-cisplatin interaction is essential for clinical application and novel drug design. As an emerging single-molecule technology, solid-state nanopore has been employed in biomolecule detection and probing DNA-molecule interactions. Herein, we reported a real-time monitoring of DNA-cisplatin interaction by employing solid-state SiN nanopores. The DNA-cisplatin interacting process is clearly classified into three stages by measuring the capture rate of DNA-cisplatin adducts. In the first stage, the negative charged DNA molecules were partially discharged due to the bonding of positive charged cisplatin and forming of mono-adducts. In the second stage, forming of DNA-cisplatin di-adducts with the adjacent bases results in DNA bending and softening. The capture rate increases since the softened bi-adducts experience a lower barrier to thread into the nanopores. In the third stage, complex structures, such as micro-loop, are formed and the DNA-cisplatin adducts are aggregated. The capture rate decreases to zero as the aggregated adduct grows to the size of the pore. The characteristic time of this stage was found to be linear with the diameter of the nanopore and this dynamic process can be described with a second-order reaction model. We are grateful to Laboratory of Microfabrication, Dr. Y. Yao, and Prof. R.C. Yu (Institute of Physics, Chinese Academy of Sciences) for technical assistance.

  19. Storing data encoded DNA in living organisms

    DOEpatents

    Wong,; Pak C. , Wong; Kwong K. , Foote; Harlan, P [Richland, WA

    2006-06-06

    Current technologies allow the generation of artificial DNA molecules and/or the ability to alter the DNA sequences of existing DNA molecules. With a careful coding scheme and arrangement, it is possible to encode important information as an artificial DNA strand and store it in a living host safely and permanently. This inventive technology can be used to identify origins and protect R&D investments. It can also be used in environmental research to track generations of organisms and observe the ecological impact of pollutants. Today, there are microorganisms that can survive under extreme conditions. As well, it is advantageous to consider multicellular organisms as hosts for stored information. These living organisms can provide as memory housing and protection for stored data or information. The present invention provides well for data storage in a living organism wherein at least one DNA sequence is encoded to represent data and incorporated into a living organism.

  20. Mitochondrial DNA Genetics and the Heteroplasmy Conundrum in Evolution and Disease

    PubMed Central

    Wallace, Douglas C.; Chalkia, Dimitra

    2013-01-01

    The unorthodox genetics of the mtDNA is providing new perspectives on the etiology of the common “complex” diseases. The maternally inherited mtDNA codes for essential energy genes, is present in thousands of copies per cell, and has a very high mutation rate. New mtDNA mutations arise among thousands of other mtDNAs. The mechanisms by which these “heteroplasmic” mtDNA mutations come to predominate in the female germline and somatic tissues is poorly understood, but essential for understanding the clinical variability of a range of diseases. Maternal inheritance and heteroplasmy also pose major challengers for the diagnosis and prevention of mtDNA disease. PMID:24186072

  1. Construction of DNA sandwich electrochemical biosensor with nanoPbS and nanoAu tags on magnetic microbeads.

    PubMed

    Du, Ping; Li, Hongxia; Cao, Wei

    2009-07-15

    A novel and sensitive sandwich electrochemical biosensor based on the amplification of magnetic microbeads and Au nanoparticles (NPs) modified with bio bar code and PbS nanoparticles was constructed in the present work. In this method, the magnetic microspheres were coated with 4 layers polyelectrolytes in order to increase carboxyl groups on the surface of the magnetic microbeads, which enhanced the amount of the capture DNA. The amino-functionalized capture DNA on the surface of magnetic microbeads hybridized with one end of target DNA, the other end of which was hybridized with signal DNA probe labelled with Au NPs on the terminus. The Au NPs were modified with bio bar code and the PbS NPs were used as a marker for identifying the target oligoncleotide. The modification of magnetic microbeads could immobilize more amino-group terminal capture DNA, and the bio bar code could increase the amount of Au NPs that combined with the target DNA. The detection of lead ions performed by anodic stripping voltammetry (ASV) technology further improved the sensitivity of the biosensor. As a result, the present DNA biosensor showed good selectivity and sensitivity by the combined amplification. Under the optimum conditions, the linear relationship with the concentration of the target DNA was ranging from 2.0 x 10(-14) M to 1.0 x 10(-12)M and a detection limit as low as 5.0 x 10(-15)M was obtained.

  2. Metabolic responses induced by DNA damage and poly (ADP-ribose) polymerase (PARP) inhibition in MCF-7 cells

    PubMed Central

    Bhute, Vijesh J.; Palecek, Sean P.

    2015-01-01

    Genomic instability is one of the hallmarks of cancer. Several chemotherapeutic drugs and radiotherapy induce DNA damage to prevent cancer cell replication. Cells in turn activate different DNA damage response (DDR) pathways to either repair the damage or induce cell death. These DDR pathways also elicit metabolic alterations which can play a significant role in the proper functioning of the cells. The understanding of these metabolic effects resulting from different types of DNA damage and repair mechanisms is currently lacking. In this study, we used NMR metabolomics to identify metabolic pathways which are altered in response to different DNA damaging agents. By comparing the metabolic responses in MCF-7 cells, we identified the activation of poly (ADP-ribose) polymerase (PARP) in methyl methanesulfonate (MMS)-induced DNA damage. PARP activation led to a significant depletion of NAD+. PARP inhibition using veliparib (ABT-888) was able to successfully restore the NAD+ levels in MMS-treated cells. In addition, double strand break induction by MMS and veliparib exhibited similar metabolic responses as zeocin, suggesting an application of metabolomics to classify the types of DNA damage responses. This prediction was validated by studying the metabolic responses elicited by radiation. Our findings indicate that cancer cell metabolic responses depend on the type of DNA damage responses and can also be used to classify the type of DNA damage. PMID:26478723

  3. Alteration of gene expression in human hepatocellular carcinoma with integrated hepatitis B virus DNA.

    PubMed

    Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei

    2005-08-15

    Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.

  4. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    PubMed

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  5. Forensic strategy to ensure the quality of sequencing data of mitochondrial DNA in highly degraded samples.

    PubMed

    Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki

    2014-01-01

    Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  6. Computation of the Genetic Code

    NASA Astrophysics Data System (ADS)

    Kozlov, Nicolay N.; Kozlova, Olga N.

    2018-03-01

    One of the problems in the development of mathematical theory of the genetic code (summary is presented in [1], the detailed -to [2]) is the problem of the calculation of the genetic code. Similar problems in the world is unknown and could be delivered only in the 21st century. One approach to solving this problem is devoted to this work. For the first time provides a detailed description of the method of calculation of the genetic code, the idea of which was first published earlier [3]), and the choice of one of the most important sets for the calculation was based on an article [4]. Such a set of amino acid corresponds to a complete set of representations of the plurality of overlapping triple gene belonging to the same DNA strand. A separate issue was the initial point, triggering an iterative search process all codes submitted by the initial data. Mathematical analysis has shown that the said set contains some ambiguities, which have been founded because of our proposed compressed representation of the set. As a result, the developed method of calculation was limited to the two main stages of research, where the first stage only the of the area were used in the calculations. The proposed approach will significantly reduce the amount of computations at each step in this complex discrete structure.

  7. Error minimizing algorithms for nearest eighbor classifiers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Porter, Reid B; Hush, Don; Zimmer, G. Beate

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore » use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.« less

  8. Silencing of the pentose phosphate pathway genes influences DNA replication in human fibroblasts.

    PubMed

    Fornalewicz, Karolina; Wieczorek, Aneta; Węgrzyn, Grzegorz; Łyżeń, Robert

    2017-11-30

    Previous reports and our recently published data indicated that some enzymes of glycolysis and the tricarboxylic acid cycle can affect the genome replication process by changing either the efficiency or timing of DNA synthesis in human normal cells. Both these pathways are connected with the pentose phosphate pathway (PPP pathway). The PPP pathway supports cell growth by generating energy and precursors for nucleotides and amino acids. Therefore, we asked if silencing of genes coding for enzymes involved in the pentose phosphate pathway may also affect the control of DNA replication in human fibroblasts. Particular genes coding for PPP pathway enzymes were partially silenced with specific siRNAs. Such cells remained viable. We found that silencing of the H6PD, PRPS1, RPE genes caused less efficient enterance to the S phase and decrease in efficiency of DNA synthesis. On the other hand, in cells treated with siRNA against G6PD, RBKS and TALDO genes, the fraction of cells entering the S phase was increased. However, only in the case of G6PD and TALDO, the ratio of BrdU incorporation to DNA was significantly changed. The presented results together with our previously published studies illustrate the complexity of the influence of genes coding for central carbon metabolism on the control of DNA replication in human fibroblasts, and indicate which of them are especially important in this process. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Public Perceptions and Expectations of the Forensic Use of DNA: Results of a Preliminary Study

    ERIC Educational Resources Information Center

    Curtis, Cate

    2009-01-01

    The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and…

  10. An evaluation of the quality of obstetric morbidity coding using an objective assessment tool, the Performance Indicators For Coding Quality (PICQ).

    PubMed

    Lamb, Mary K; Innes, Kerry; Saad, Patricia; Rust, Julie; Dimitropoulos, Vera; Cumerlato, Megan

    The Performance Indicators for Coding Quality (PICQ) is a data quality assessment tool developed by Australia's National Centre for Classification in Health (NCCH). PICQ consists of a number of indicators covering all ICD-10-AM disease chapters, some procedure chapters from the Australian Classification of Health Intervention (ACHI) and some Australian Coding Standards (ACS). The indicators can be used to assess the coding quality of hospital morbidity data by monitoring compliance of coding conventions and ACS; this enables the identification of particular records that may be incorrectly coded, thus providing a measure of data quality. There are 31 obstetric indicators available for the ICD-10-AM Fourth Edition. Twenty of these 31 indicators were classified as Fatal, nine as Warning and two Relative. These indicators were used to examine coding quality of obstetric records in the 2004-2005 financial year Australian national hospital morbidity dataset. Records with obstetric disease or procedure codes listed anywhere in the code string were extracted and exported from the SPSS source file. Data were then imported into a Microsoft Access database table as per PICQ instructions, and run against all Fatal and Warning and Relative (N=31) obstetric PICQ 2006 Fourth Edition Indicators v.5 for the ICD-10- AM Fourth Edition. There were 689,905 gynaecological and obstetric records in the 2004-2005 financial year, of which 1.14% were found to have triggered Fatal degree errors, 3.78% Warning degree errors and 8.35% Relative degree errors. The types of errors include completeness, redundancy, specificity and sequencing problems. It was found that PICQ is a useful initial screening tool for the assessment of ICD-10-AM/ACHI coding quality. The overall quality of codes assigned to obstetric records in the 2004- 2005 Australian national morbidity dataset is of fair quality.

  11. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 49 Transportation 9 2012-10-01 2012-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All classified...

  12. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 49 Transportation 9 2011-10-01 2011-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All classified...

  13. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 9 2010-10-01 2010-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All classified...

  14. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 49 Transportation 9 2014-10-01 2014-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All classified...

  15. Preparation of next-generation sequencing libraries using Nextera™ technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition.

    PubMed

    Caruccio, Nicholas

    2011-01-01

    DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.

  16. DNA-programmable nanoparticle crystallization.

    PubMed

    Park, Sung Yong; Lytton-Jean, Abigail K R; Lee, Byeongdu; Weigand, Steven; Schatz, George C; Mirkin, Chad A

    2008-01-31

    It was first shown more than ten years ago that DNA oligonucleotides can be attached to gold nanoparticles rationally to direct the formation of larger assemblies. Since then, oligonucleotide-functionalized nanoparticles have been developed into powerful diagnostic tools for nucleic acids and proteins, and into intracellular probes and gene regulators. In contrast, the conceptually simple yet powerful idea that functionalized nanoparticles might serve as basic building blocks that can be rationally assembled through programmable base-pairing interactions into highly ordered macroscopic materials remains poorly developed. So far, the approach has mainly resulted in polymerization, with modest control over the placement of, the periodicity in, and the distance between particles within the assembled material. That is, most of the materials obtained thus far are best classified as amorphous polymers, although a few examples of colloidal crystal formation exist. Here, we demonstrate that DNA can be used to control the crystallization of nanoparticle-oligonucleotide conjugates to the extent that different DNA sequences guide the assembly of the same type of inorganic nanoparticle into different crystalline states. We show that the choice of DNA sequences attached to the nanoparticle building blocks, the DNA linking molecules and the absence or presence of a non-bonding single-base flexor can be adjusted so that gold nanoparticles assemble into micrometre-sized face-centred-cubic or body-centred-cubic crystal structures. Our findings thus clearly demonstrate that synthetically programmable colloidal crystallization is possible, and that a single-component system can be directed to form different structures.

  17. Analysis of protein-coding genetic variation in 60,706 humans.

    PubMed

    Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

    2016-08-18

    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

  18. Use of the Coding Causes of Death in HIV in the classification of deaths in Northeastern Brazil.

    PubMed

    Alves, Diana Neves; Bresani-Salvi, Cristiane Campello; Batista, Joanna d'Arc Lyra; Ximenes, Ricardo Arraes de Alencar; Miranda-Filho, Demócrito de Barros; Melo, Heloísa Ramos Lacerda de; Albuquerque, Maria de Fátima Pessoa Militão de

    2017-01-01

    Describe the coding process of death causes for people living with HIV/AIDS, and classify deaths as related or unrelated to immunodeficiency by applying the Coding Causes of Death in HIV (CoDe) system. A cross-sectional study that codifies and classifies the causes of deaths occurring in a cohort of 2,372 people living with HIV/AIDS, monitored between 2007 and 2012, in two specialized HIV care services in Pernambuco. The causes of death already codified according to the International Classification of Diseases were recoded and classified as deaths related and unrelated to immunodeficiency by the CoDe system. We calculated the frequencies of the CoDe codes for the causes of death in each classification category. There were 315 (13%) deaths during the study period; 93 (30%) were caused by an AIDS-defining illness on the Centers for Disease Control and Prevention list. A total of 232 deaths (74%) were related to immunodeficiency after application of the CoDe. Infections were the most common cause, both related (76%) and unrelated (47%) to immunodeficiency, followed by malignancies (5%) in the first group and external causes (16%), malignancies (12 %) and cardiovascular diseases (11%) in the second group. Tuberculosis comprised 70% of the immunodeficiency-defining infections. Opportunistic infections and aging diseases were the most frequent causes of death, adding multiple disease burdens on health services. The CoDe system increases the probability of classifying deaths more accurately in people living with HIV/AIDS. Descrever o processo de codificação das causas de morte em pessoas vivendo com HIV/Aids, e classificar os óbitos como relacionados ou não relacionados à imunodeficiência aplicando o sistema Coding Causes of Death in HIV (CoDe). Estudo transversal, que codifica e classifica as causas dos óbitos ocorridos em uma coorte de 2.372 pessoas vivendo com HIV/Aids acompanhadas entre 2007 e 2012 em dois serviços de atendimento especializado em HIV em

  19. Shannon Entropy of the Canonical Genetic Code

    NASA Astrophysics Data System (ADS)

    Nemzer, Louis

    The probability that a non-synonymous point mutation in DNA will adversely affect the functionality of the resultant protein is greatly reduced if the substitution is conservative. In that case, the amino acid coded by the mutated codon has similar physico-chemical properties to the original. Many simplified alphabets, which group the 20 common amino acids into families, have been proposed. To evaluate these schema objectively, we introduce a novel, quantitative method based on the inherent redundancy in the canonical genetic code. By calculating the Shannon information entropy carried by 1- or 2-bit messages, groupings that best leverage the robustness of the code are identified. The relative importance of properties related to protein folding - like hydropathy and size - and function, including side-chain acidity, can also be estimated. In addition, this approach allows us to quantify the average information value of nucleotide codon positions, and explore the physiological basis for distinguishing between transition and transversion mutations. Supported by NSU PFRDG Grant #335347.

  20. 48 CFR 3.908-8 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Classified information. 3... Employees 3.908-8 Classified information. 41 U.S.C. 4712 does not provide any right to disclose classified information not otherwise provided by law. [78 FR 60171, Sept. 30, 2013] ...

  1. Quantum-dot-tagged microbeads for multiplexed optical coding of biomolecules.

    PubMed

    Han, M; Gao, X; Su, J Z; Nie, S

    2001-07-01

    Multicolor optical coding for biological assays has been achieved by embedding different-sized quantum dots (zinc sulfide-capped cadmium selenide nanocrystals) into polymeric microbeads at precisely controlled ratios. Their novel optical properties (e.g., size-tunable emission and simultaneous excitation) render these highly luminescent quantum dots (QDs) ideal fluorophores for wavelength-and-intensity multiplexing. The use of 10 intensity levels and 6 colors could theoretically code one million nucleic acid or protein sequences. Imaging and spectroscopic measurements indicate that the QD-tagged beads are highly uniform and reproducible, yielding bead identification accuracies as high as 99.99% under favorable conditions. DNA hybridization studies demonstrate that the coding and target signals can be simultaneously read at the single-bead level. This spectral coding technology is expected to open new opportunities in gene expression studies, high-throughput screening, and medical diagnostics.

  2. Microsatellites in the Eukaryotic DNA Mismatch Repair Genes as Modulators of Evolutionary Mutation Rate

    NASA Technical Reports Server (NTRS)

    Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard

    2003-01-01

    All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.

  3. Genome-wide analysis of the DNA-binding with one zinc finger (Dof) transcription factor family in bananas.

    PubMed

    Dong, Chen; Hu, Huigang; Xie, Jianghui

    2016-12-01

    DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.

  4. 36 CFR 1256.46 - National security-classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National Security...

  5. Phenotypic characterization of an Arabidopsis T-DNA insertion line SALK_063500.

    PubMed

    Sng, Natasha J; Paul, Anna-Lisa; Ferl, Robert J

    2018-06-01

    In this article we report the identification of a homozygous lethal T-DNA (transfer DNA) line within the coding region of the At1G05290 gene in the genome of Arabidopsis thaliana (Arabidopsis) line, SALK_063500. The T-DNA insertion is found within exon one of the AT1G05290 gene, however a homozygous T-DNA allele is unattainable. In the heterozygous T-DNA allele the expression levels of AT1G05290 were compared to wild type Arabidopsis (Col-0, Columbia). Further analyses revealed an aberrant silique phenotype found in the heterozygous SALK_063500 plants that is attributed to the reduced rate of pollen tube germination. These data are original and have not been published elsewhere.

  6. DNA typing in forensic medicine and in criminal investigations: a current survey.

    PubMed

    Benecke, M

    1997-05-01

    Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.

  7. DNA typing in forensic medicine and in criminal investigations: a current survey

    NASA Astrophysics Data System (ADS)

    Benecke, Mark

    Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.

  8. Class-specific Error Bounds for Ensemble Classifiers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prenger, R; Lemmond, T; Varshney, K

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missedmore » detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.« less

  9. Development of biometric DNA ink for authentication security.

    PubMed

    Hashiyada, Masaki

    2004-10-01

    Among the various types of biometric personal identification systems, DNA provides the most reliable personal identification. It is intrinsically digital and unchangeable while the person is alive, and even after his/her death. Increasing the number of DNA loci examined can enhance the power of discrimination. This report describes the development of DNA ink, which contains synthetic DNA mixed with printing inks. Single-stranded DNA fragments encoding a personalized set of short tandem repeats (STR) were synthesized. The sequence was defined as follows. First, a decimal DNA personal identification (DNA-ID) was established based on the number of STRs in the locus. Next, this DNA-ID was encrypted using a binary, 160-bit algorithm, using a hashing function to protect privacy. Since this function is irreversible, no one can recover the original information from the encrypted code. Finally, the bit series generated above is transformed into base sequences, and double-stranded DNA fragments are amplified by the polymerase chain reaction (PCR) to protect against physical attacks. Synthesized DNA was detected successfully after samples printed in DNA ink were subjected to several resistance tests used to assess the stability of printing inks. Endurance test results showed that this DNA ink would be suitable for practical use as a printing ink and was resistant to 40 hours of ultraviolet exposure, performance commensurate with that of photogravure ink. Copyright 2004 Tohoku University Medical Press

  10. Disclosure of terminal illness to patients and families: diversity of governing codes in 14 Islamic countries.

    PubMed

    Abdulhameed, Hunida E; Hammami, Muhammad M; Mohamed, Elbushra A Hameed

    2011-08-01

    The consistency of codes governing disclosure of terminal illness to patients and families in Islamic countries has not been studied until now. To review available codes on disclosure of terminal illness in Islamic countries. DATA SOURCE AND EXTRACTION: Data were extracted through searches on Google and PubMed. Codes related to disclosure of terminal illness to patients or families were abstracted, and then classified independently by the three authors. Codes for 14 Islamic countries were located. Five codes were silent regarding informing the patient, seven allowed concealment, one mandated disclosure and one prohibited disclosure. Five codes were silent regarding informing the family, four allowed disclosure and five mandated/recommended disclosure. The Islamic Organization for Medical Sciences code was silent on both issues. Codes regarding disclosure of terminal illness to patients and families differed markedly among Islamic countries. They were silent in one-third of the codes, and tended to favour a paternalistic/utilitarian, family-centred approach over an autonomous, patient-centred approach.

  11. Consensus Classification Using Non-Optimized Classifiers.

    PubMed

    Brownfield, Brett; Lemos, Tony; Kalivas, John H

    2018-04-03

    Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.

  12. Alterations in sperm DNA methylation, non-coding RNA and histone retention associate with DDT-induced epigenetic transgenerational inheritance of disease.

    PubMed

    Skinner, Michael K; Ben Maamar, Millissia; Sadler-Riggleman, Ingrid; Beck, Daniel; Nilsson, Eric; McBirney, Margaux; Klukovich, Rachel; Xie, Yeming; Tang, Chong; Yan, Wei

    2018-02-27

    Environmental toxicants such as DDT have been shown to induce the epigenetic transgenerational inheritance of disease (e.g., obesity) through the germline. The current study was designed to investigate the DDT-induced concurrent alterations of a number of different epigenetic processes including DNA methylation, non-coding RNA (ncRNA) and histone retention in sperm. Gestating females were exposed transiently to DDT during fetal gonadal development, and then, the directly exposed F1 generation, the directly exposed germline F2 generation and the transgenerational F3 generation sperm were investigated. DNA methylation and ncRNA were altered in each generation sperm with the direct exposure F1 and F2 generations being predominantly distinct from the F3 generation epimutations. The piRNA and small tRNA were the most predominant classes of ncRNA altered. A highly conserved set of histone retention sites were found in the control lineage generations which was not significantly altered between generations, but a large number of new histone retention sites were found only in the transgenerational generation DDT lineage sperm. Therefore, all three different epigenetic processes were concurrently altered as DDT induced the epigenetic transgenerational inheritance of sperm epimutations. The direct exposure generations sperm epigenetic alterations were distinct from the transgenerational sperm epimutations. The genomic features and gene associations with the epimutations were investigated to help elucidate the integration of these different epigenetic processes. Observations demonstrate all three epigenetic processes are involved in transgenerational inheritance. The different epigenetic processes appear to be integrated in mediating the epigenetic transgenerational inheritance phenomenon.

  13. Physical Model for the Evolution of the Genetic Code

    NASA Astrophysics Data System (ADS)

    Yamashita, Tatsuro; Narikiyo, Osamu

    2011-12-01

    Using the shape space of codons and tRNAs we give a physical description of the genetic code evolution on the basis of the codon capture and ambiguous intermediate scenarios in a consistent manner. In the lowest dimensional version of our description, a physical quantity, codon level is introduced. In terms of the codon levels two scenarios are typically classified into two different routes of the evolutional process. In the case of the ambiguous intermediate scenario we perform an evolutional simulation implemented cost selection of amino acids and confirm a rapid transition of the code change. Such rapidness reduces uncomfortableness of the non-unique translation of the code at intermediate state that is the weakness of the scenario. In the case of the codon capture scenario the survival against mutations under the mutational pressure minimizing GC content in genomes is simulated and it is demonstrated that cells which experience only neutral mutations survive.

  14. Just-in-time classifiers for recurrent concepts.

    PubMed

    Alippi, Cesare; Boracchi, Giacomo; Roveri, Manuel

    2013-04-01

    Just-in-time (JIT) classifiers operate in evolving environments by classifying instances and reacting to concept drift. In stationary conditions, a JIT classifier improves its accuracy over time by exploiting additional supervised information coming from the field. In nonstationary conditions, however, the classifier reacts as soon as concept drift is detected; the current classification setup is discarded and a suitable one activated to keep the accuracy high. We present a novel generation of JIT classifiers able to deal with recurrent concept drift by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. The concept-drift detection activity, which is crucial in promptly reacting to changes exactly when needed, is advanced by considering change-detection tests monitoring both inputs and classes distributions.

  15. Deconvolution When Classifying Noisy Data Involving Transformations.

    PubMed

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  16. Simulation of the charge migration in DNA under irradiation with heavy ions.

    PubMed

    Belov, Oleg V; Boyda, Denis L; Plante, Ianik; Shirmovsky, Sergey Eh

    2015-01-01

    A computer model to simulate the processes of charge injection and migration through DNA after irradiation by a heavy charged particle was developed. The most probable sites of charge injection were obtained by merging spatial models of short DNA sequence and a single 1 GeV/u iron particle track simulated by the code RITRACKS (Relativistic Ion Tracks). Charge migration was simulated by using a quantum-classical nonlinear model of the DNA-charge system. It was found that charge migration depends on the environmental conditions. The oxidative damage in DNA occurring during hole migration was simulated concurrently, which allowed the determination of probable locations of radiation-induced DNA lesions.

  17. Ube2V2 Is a Rosetta Stone Bridging Redox and Ubiquitin Codes, Coordinating DNA Damage Responses.

    PubMed

    Zhao, Yi; Long, Marcus J C; Wang, Yiran; Zhang, Sheng; Aye, Yimon

    2018-02-28

    Posttranslational modifications (PTMs) are the lingua franca of cellular communication. Most PTMs are enzyme-orchestrated. However, the reemergence of electrophilic drugs has ushered mining of unconventional/non-enzyme-catalyzed electrophile-signaling pathways. Despite the latest impetus toward harnessing kinetically and functionally privileged cysteines for electrophilic drug design, identifying these sensors remains challenging. Herein, we designed "G-REX"-a technique that allows controlled release of reactive electrophiles in vivo. Mitigating toxicity/off-target effects associated with uncontrolled bolus exposure, G-REX tagged first-responding innate cysteines that bind electrophiles under true k cat / K m conditions. G-REX identified two allosteric ubiquitin-conjugating proteins-Ube2V1/Ube2V2-sharing a novel privileged-sensor-cysteine. This non-enzyme-catalyzed-PTM triggered responses specific to each protein. Thus, G-REX is an unbiased method to identify novel functional cysteines. Contrasting conventional active-site/off-active-site cysteine-modifications that regulate target activity, modification of Ube2V2 allosterically hyperactivated its enzymatically active binding-partner Ube2N, promoting K63-linked client ubiquitination and stimulating H2AX-dependent DNA damage response. This work establishes Ube2V2 as a Rosetta-stone bridging redox and ubiquitin codes to guard genome integrity.

  18. Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

    PubMed Central

    Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

    2010-01-01

    Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877

  19. Smurf2 Regulates DNA Repair and Packaging to Prevent Tumors | Center for Cancer Research

    Cancer.gov

    The blueprint for all of a cell’s functions is written in the genetic code of DNA sequences as well as in the landscape of DNA and histone modifications. DNA is wrapped around histones to package it into chromatin, which is stored in the nucleus. It is important to maintain the integrity of the chromatin structure to ensure that the cell continues to behave appropriately.

  20. Database of amino acid-nucleotide contacts in contacts in DNA-homeodomain protein

    NASA Astrophysics Data System (ADS)

    Grokhlina, T. I.; Zrelov, P. V.; Ivanov, V. V.; Polozov, R. V.; Chirgadze, Yu. N.; Sivozhelezov, V. S.

    2013-09-01

    The analysis of amino acid-nucleotide contacts in interfaces of the protein-DNA complexes, intended to find consistencies in the protein-DNA recognition, is a complex problem that requires an analysis of the physicochemical characteristics of these contacts and the positions of the participating amino acids and nucleotides in the chains of the protein and the DNA, respectively, as well as conservatism of these contacts. Thus, those heterogeneous data should be systematized. For this purpose we have developed a database of amino acid-nucleotide contacts ANTPC (Amino acid Nucleotide Type Position Conservation) following the archetypal example of the proteins in the homeodomain family. We show that it can be used to compare and classify the interfaces of the protein-DNA complexes.

  1. Sequence-dependent modelling of local DNA bending phenomena: curvature prediction and vibrational analysis.

    PubMed

    Vlahovicek, K; Munteanu, M G; Pongor, S

    1999-01-01

    Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).

  2. iDBPs: a web server for the identification of DNA binding proteins.

    PubMed

    Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2010-03-01

    The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies. http://idbps.tau.ac.il/

  3. 76 FR 19707 - Classified Information: Classification/Declassification/Access; Authority To Classify Information

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-08

    ... SECRET or CONFIDENTIAL to the Administrator of the Federal Aviation Administration, and to the Assistant... authority to originally classify information as SECRET or CONFIDENTIAL, with further authorization to...

  4. DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

    PubMed

    Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano

    2018-01-01

    Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .

  5. Capturing Snapshots of APE1 Processing DNA Damage

    PubMed Central

    Freudenthal, Bret D.; Beard, William A.; Cuneo, Matthew J.; Dyrkheeva, Nadezhda S.; Wilson, Samuel H.

    2015-01-01

    DNA apurinic-apyrimidinic (AP) sites are prevalent non-coding threats to genomic stability and are processed by AP endonuclease 1 (APE1). APE1 incises the AP-site phosphodiester backbone, generating a DNA repair intermediate that is potentially cytotoxic. The molecular events of the incision reaction remain elusive due in part to limited structural information. We report multiple high-resolution human APE1:DNA structures that divulge novel features of the APE1 reaction, including the metal binding site, nucleophile, and arginine clamps that mediate product release. We also report APE1:DNA structures with a T:G mismatch 5′ to the AP-site, representing a clustered lesion occurring in methylated CpG dinucleotides. These reveal that APE1 molds the T:G mismatch into a unique Watson-Crick like geometry that distorts the active site reducing incision. These snapshots provide mechanistic clarity for APE1, while affording a rational framework to manipulate biological responses to DNA damage. PMID:26458045

  6. Classifying Cereal Data

    Cancer.gov

    The DSQ includes questions about cereal intake and allows respondents up to two responses on which cereals they consume. We classified each cereal reported first by hot or cold, and then along four dimensions: density of added sugars, whole grains, fiber, and calcium.

  7. Re-visiting protein-centric two-tier classification of existing DNA-protein complexes

    PubMed Central

    2012-01-01

    Background Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. Results On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Conclusions Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of

  8. New Trends of Digital Data Storage in DNA

    PubMed Central

    2016-01-01

    With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged period of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand extreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as a potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an organic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving computational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA while identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and stenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also identifies the advantages and limitations of DNA as a memory device and memory applications. PMID:27689089

  9. New Trends of Digital Data Storage in DNA.

    PubMed

    De Silva, Pavani Yashodha; Ganegoda, Gamage Upeksha

    With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged period of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand extreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as a potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an organic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving computational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA while identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and stenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also identifies the advantages and limitations of DNA as a memory device and memory applications.

  10. Accuracy comparison among different machine learning techniques for detecting malicious codes

    NASA Astrophysics Data System (ADS)

    Narang, Komal

    2016-03-01

    In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.

  11. The artificial zinc finger coding gene 'Jazz' binds the utrophin promoter and activates transcription.

    PubMed

    Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C

    2000-06-01

    Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.

  12. Dynamic Dimensionality Selection for Bayesian Classifier Ensembles

    DTIC Science & Technology

    2015-03-19

    learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but

  13. A Rewritable, Random-Access DNA-Based Storage System.

    PubMed

    Yazdi, S M Hossein Tabatabaei; Yuan, Yongbo; Ma, Jian; Zhao, Huimin; Milenkovic, Olgica

    2015-09-18

    We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.

  14. A Rewritable, Random-Access DNA-Based Storage System

    NASA Astrophysics Data System (ADS)

    Tabatabaei Yazdi, S. M. Hossein; Yuan, Yongbo; Ma, Jian; Zhao, Huimin; Milenkovic, Olgica

    2015-09-01

    We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.

  15. DNA Polymerase κ Is a Key Cellular Factor for the Formation of Covalently Closed Circular DNA of Hepatitis B Virus.

    PubMed

    Qi, Yonghe; Gao, Zhenchao; Xu, Guangwei; Peng, Bo; Liu, Chenxuan; Yan, Huan; Yao, Qiyan; Sun, Guoliang; Liu, Yang; Tang, Dingbin; Song, Zilin; He, Wenhui; Sun, Yinyan; Guo, Ju-Tao; Li, Wenhui

    2016-10-01

    Hepatitis B virus (HBV) infection of hepatocytes begins by binding to its cellular receptor sodium taurocholate cotransporting polypeptide (NTCP), followed by the internalization of viral nucleocapsid into the cytoplasm. The viral relaxed circular (rc) DNA genome in nucleocapsid is transported into the nucleus and converted into covalently closed circular (ccc) DNA to serve as a viral persistence reservoir that is refractory to current antiviral therapies. Host DNA repair enzymes have been speculated to catalyze the conversion of rcDNA to cccDNA, however, the DNA polymerase(s) that fills the gap in the plus strand of rcDNA remains to be determined. Here we conducted targeted genetic screening in combination with chemical inhibition to identify the cellular DNA polymerase(s) responsible for cccDNA formation, and exploited recombinant HBV with capsid coding deficiency which infects HepG2-NTCP cells with similar efficiency of wild-type HBV to assure cccDNA synthesis is exclusively from de novo HBV infection. We found that DNA polymerase κ (POLK), a Y-family DNA polymerase with maximum activity in non-dividing cells, substantially contributes to cccDNA formation during de novo HBV infection. Depleting gene expression of POLK in HepG2-NTCP cells by either siRNA knockdown or CRISPR/Cas9 knockout inhibited the conversion of rcDNA into cccDNA, while the diminished cccDNA formation in, and hence the viral infection of, the knockout cells could be effectively rescued by ectopic expression of POLK. These studies revealed that POLK is a crucial host factor required for cccDNA formation during a de novo HBV infection and suggest that POLK may be a potential target for developing antivirals against HBV.

  16. Changes in the Coding and Non-coding Transcriptome and DNA Methylome that Define the Schwann Cell Repair Phenotype after Nerve Injury.

    PubMed

    Arthur-Farraj, Peter J; Morgan, Claire C; Adamowicz, Martyna; Gomez-Sanchez, Jose A; Fazal, Shaline V; Beucher, Anthony; Razzaghi, Bonnie; Mirsky, Rhona; Jessen, Kristjan R; Aitman, Timothy J

    2017-09-12

    Repair Schwann cells play a critical role in orchestrating nerve repair after injury, but the cellular and molecular processes that generate them are poorly understood. Here, we perform a combined whole-genome, coding and non-coding RNA and CpG methylation study following nerve injury. We show that genes involved in the epithelial-mesenchymal transition are enriched in repair cells, and we identify several long non-coding RNAs in Schwann cells. We demonstrate that the AP-1 transcription factor C-JUN regulates the expression of certain micro RNAs in repair Schwann cells, in particular miR-21 and miR-34. Surprisingly, unlike during development, changes in CpG methylation are limited in injury, restricted to specific locations, such as enhancer regions of Schwann cell-specific genes (e.g., Nedd4l), and close to local enrichment of AP-1 motifs. These genetic and epigenomic changes broaden our mechanistic understanding of the formation of repair Schwann cell during peripheral nervous system tissue repair. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  17. The mitochondrial DNA makeup of Romanians: A forensic mtDNA control region database and phylogenetic characterization.

    PubMed

    Turchi, Chiara; Stanciu, Florin; Paselli, Giorgia; Buscemi, Loredana; Parson, Walther; Tagliabracci, Adriano

    2016-09-01

    To evaluate the pattern of Romanian population from a mitochondrial perspective and to establish an appropriate mtDNA forensic database, we generated a high-quality mtDNA control region dataset from 407 Romanian subjects belonging to four major historical regions: Moldavia, Transylvania, Wallachia and Dobruja. The entire control region (CR) was analyzed by Sanger-type sequencing assays and the resulting 306 different haplotypes were classified into haplogroups according to the most updated mtDNA phylogeny. The Romanian gene pool is mainly composed of West Eurasian lineages H (31.7%), U (12.8%), J (10.8%), R (10.1%), T (9.1%), N (8.1%), HV (5.4%),K (3.7%), HV0 (4.2%), with exceptions of East Asian haplogroup M (3.4%) and African haplogroup L (0.7%). The pattern of mtDNA variation observed in this study indicates that the mitochondrial DNA pool is geographically homogeneous across Romania and that the haplogroup composition reveals signals of admixture of populations of different origin. The PCA scatterplot supported this scenario, with Romania located in southeastern Europe area, close to Bulgaria and Hungary, and as a borderland with respect to east Mediterranean and other eastern European countries. High haplotype diversity (0.993) and nucleotide diversity indices (0.00838±0.00426), together with low random match probability (0.0087) suggest the usefulness of this control region dataset as a forensic database in routine forensic mtDNA analysis and in the investigation of maternal genetic lineages in the Romanian population. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  18. RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China

    PubMed Central

    Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang

    2013-01-01

    Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571

  19. 28 CFR 16.7 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 1 2013-07-01 2013-07-01 false Classified information. 16.7 Section 16.7 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION Procedures for Disclosure of Records Under the Freedom of Information Act § 16.7 Classified information. In...

  20. 28 CFR 16.44 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 1 2013-07-01 2013-07-01 false Classified information. 16.44 Section 16.44 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION... information. In processing a request for access to a record containing information that is classified under...

  1. 28 CFR 16.7 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Classified information. 16.7 Section 16.7 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION Procedures for Disclosure of Records Under the Freedom of Information Act § 16.7 Classified information. In...

  2. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process § 651.13 Classified actions. (a) For proposed actions and NEPA analyses involving classified information, AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification...

  3. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process § 651.13 Classified actions. (a) For proposed actions and NEPA analyses involving classified information, AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification...

  4. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process § 651.13 Classified actions. (a) For proposed actions and NEPA analyses involving classified information, AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification...

  5. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process § 651.13 Classified actions. (a) For proposed actions and NEPA analyses involving classified information, AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification...

  6. Population dynamics coded in DNA: genetic traces of the expansion of modern humans

    NASA Astrophysics Data System (ADS)

    Kimmel, Marek

    1999-12-01

    It has been proposed that modern humans evolved from a small ancestral population, which appeared several hundred thousand years ago in Africa. Descendants of the founder group migrated to Europe and then to Asia, not mixing with the pre-existing local populations but replacing them. Two demographic elements are present in this “out of Africa” hypothesis: numerical growth of the modern humans and their migration into Eurasia. Did these processes leave an imprint in our DNA? To address this question, we use the classical Fisher-Wright-Moran model of population genetics, assuming variable population size and two models of mutation: the infinite-sites model and the stepwise-mutation model. We use the coalescence theory, which amounts to tracing the common ancestors of contemporary genes. We obtain mathematical formulae expressing the distribution of alleles given the time changes of population size . In the framework of the infinite-sites model, simulations indicate that the pattern of past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mitochondrial DNA sequences indicates that the current mitochondrial DNA sequence variation is not inconsistent with the logistic growth of the modern human population. In the framework of the stepwise-mutation model, we demonstrate that population bottleneck followed by growth in size causes an imbalance between allele-size variance and heterozygosity. We analyze a set of data on tetranucleotide repeats which reveals the existence of this imbalance. The pattern of imbalance is consistent with the bottleneck being most ancient in Africans, most recent in Asians and intermediate in Europeans. These findings are consistent with the “out of Africa” hypothesis, although by no means do they constitute its proof.

  7. QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

    PubMed Central

    Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

    2015-01-01

    Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082

  8. Team interaction during surgery: a systematic review of communication coding schemes.

    PubMed

    Tiferes, Judith; Bisantz, Ann M; Guru, Khurshid A

    2015-05-15

    Communication problems have been systematically linked to human errors in surgery and a deep understanding of the underlying processes is essential. Although a number of tools exist to assess nontechnical skills, methods to study communication and other team-related processes are far from being standardized, making comparisons challenging. We conducted a systematic review to analyze methods used to study events in the operating room (OR) and to develop a synthesized coding scheme for OR team communication. Six electronic databases were accessed to search for articles that collected individual events during surgery and included detailed coding schemes. Additional articles were added based on cross-referencing. That collection was then classified based on type of events collected, environment type (real or simulated), number of procedures, type of surgical task, team characteristics, method of data collection, and coding scheme characteristics. All dimensions within each coding scheme were grouped based on emergent content similarity. Categories drawn from articles, which focused on communication events, were further analyzed and synthesized into one common coding scheme. A total of 34 of 949 articles met the inclusion criteria. The methodological characteristics and coding dimensions of the articles were summarized. A priori coding was used in nine studies. The synthesized coding scheme for OR communication included six dimensions as follows: information flow, period, statement type, topic, communication breakdown, and effects of communication breakdown. The coding scheme provides a standardized coding method for OR communication, which can be used to develop a priori codes for future studies especially in comparative effectiveness research. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. XPD polymorphisms: effects on DNA repair proficiency.

    PubMed

    Lunn, R M; Helzlsouer, K J; Parshad, R; Umbach, D M; Harris, E L; Sanford, K K; Bell, D A

    2000-04-01

    XPD codes for a DNA helicase involved in transcription and nucleotide excision repair. Rare XPD mutations diminish nucleotide excision repair resulting in hypersensitivity to UV light and increased risk of skin cancer. Several polymorphisms in this gene have been identified but their impact on DNA repair is not known. We compared XPD genotypes at codons 312 and 751 with DNA repair proficiency in 31 women. XPD genotypes were measured by PCR-RFLP. DNA repair proficiency was assessed using a cytogenetic assay that detects X-ray induced chromatid aberrations (breaks and gaps). Chromatid aberrations were scored per 100 metaphase cells following incubation at 37 degrees C (1.5 h after irradiation) to allow for repair of DNA damage. Individuals with the Lys/Lys codon 751 XPD genotype had a higher number of chromatid aberrations (132/100 metaphase cells) than those having a 751Gln allele (34/100 metaphase cells). Individuals having greater than 60 chromatid breaks plus gaps were categorized as having sub-optimal repair. Possessing a Lys/Lys751 genotype increased the risk of sub-optimal DNA repair (odds ratio = 7.2, 95% confidence interval = 1.01-87.7). The Asp312Asn XPD polymorphism did not appear to affect DNA repair proficiency. These results suggest that the Lys751 (common) allele may alter the XPD protein product resulting in sub-optimal repair of X-ray-induced DNA damage.

  10. A Tandemly Arranged Pattern of Two 5S rDNA Arrays in Amolops mantzorum (Anura, Ranidae).

    PubMed

    Liu, Ting; Song, Menghuan; Xia, Yun; Zeng, Xiaomao

    2017-01-01

    In an attempt to extend the knowledge of the 5S rDNA organization in anurans, the 5S rDNA sequences of Amolops mantzorum were isolated, characterized, and mapped by FISH. Two forms of 5S rDNA, type I (209 bp) and type II (about 870 bp), were found in specimens investigated from various populations. Both of them contained a 118-bp coding sequence, readily differentiated by their non-transcribed spacer (NTS) sizes and compositions. Four probes (the 5S rDNA coding sequences, the type I NTS, the type II NTS, and the entire type II 5S rDNA sequences) were respectively labeled with TAMRA or digoxigenin to hybridize with mitotic chromosomes for samples of all localities. It turned out that all probes showed the same signals that appeared in every centromeric region and in the telomeric regions of chromosome 5, without differences within or between populations. Obviously, both type I and type II of the 5S rDNA arrays arranged in tandem, which was contrasting with other frogs or fishes recorded to date. More interestingly, all the probes detected centromeric regions in all karyotypes, suggesting the presence of a satellite DNA family derived from 5S rDNA. © 2017 S. Karger AG, Basel.

  11. The Winding Road to Discovering the Link between Genetic Material and DNA

    ERIC Educational Resources Information Center

    Cherif, Abour H.; Roze, Maris; Movahedzadeh, Farahnaz

    2015-01-01

    This is an account of the three-centuries long journey to the discovery of the link between DNA and the transformation principle of heredity beginning with the discovery of the cell in 1665 and leading up to the 1953 discovery of the genetic code and the structure of DNA. This account also illustrates the way science works and how scientists do…

  12. 6 CFR 5.24 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Classified information. 5.24 Section 5.24 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY DISCLOSURE OF RECORDS AND INFORMATION Privacy Act § 5.24 Classified information. In processing a request for access to a record...

  13. 28 CFR 16.44 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Classified information. 16.44 Section 16.44 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION Protection of Privacy and Access to Individual Records Under the Privacy Act of 1974 § 16.44 Classified...

  14. Generating and repairing genetically programmed DNA breaks during immunoglobulin class switch recombination

    PubMed Central

    Nicolas, Laura; Cols, Montserrat; Choi, Jee Eun; Chaudhuri, Jayanta; Vuong, Bao

    2018-01-01

    Adaptive immune responses require the generation of a diverse repertoire of immunoglobulins (Igs) that can recognize and neutralize a seemingly infinite number of antigens. V(D)J recombination creates the primary Ig repertoire, which subsequently is modified by somatic hypermutation (SHM) and class switch recombination (CSR). SHM promotes Ig affinity maturation whereas CSR alters the effector function of the Ig. Both SHM and CSR require activation-induced cytidine deaminase (AID) to produce dU:dG mismatches in the Ig locus that are transformed into untemplated mutations in variable coding segments during SHM or DNA double-strand breaks (DSBs) in switch regions during CSR. Within the Ig locus, DNA repair pathways are diverted from their canonical role in maintaining genomic integrity to permit AID-directed mutation and deletion of gene coding segments. Recently identified proteins, genes, and regulatory networks have provided new insights into the temporally and spatially coordinated molecular interactions that control the formation and repair of DSBs within the Ig locus. Unravelling the genetic program that allows B cells to selectively alter the Ig coding regions while protecting non-Ig genes from DNA damage advances our understanding of the molecular processes that maintain genomic integrity as well as humoral immunity. PMID:29744038

  15. Biometric iris image acquisition system with wavefront coding technology

    NASA Astrophysics Data System (ADS)

    Hsieh, Sheng-Hsun; Yang, Hsi-Wen; Huang, Shao-Hung; Li, Yung-Hui; Tien, Chung-Hao

    2013-09-01

    Biometric signatures for identity recognition have been practiced for centuries. Basically, the personal attributes used for a biometric identification system can be classified into two areas: one is based on physiological attributes, such as DNA, facial features, retinal vasculature, fingerprint, hand geometry, iris texture and so on; the other scenario is dependent on the individual behavioral attributes, such as signature, keystroke, voice and gait style. Among these features, iris recognition is one of the most attractive approaches due to its nature of randomness, texture stability over a life time, high entropy density and non-invasive acquisition. While the performance of iris recognition on high quality image is well investigated, not too many studies addressed that how iris recognition performs subject to non-ideal image data, especially when the data is acquired in challenging conditions, such as long working distance, dynamical movement of subjects, uncontrolled illumination conditions and so on. There are three main contributions in this paper. Firstly, the optical system parameters, such as magnification and field of view, was optimally designed through the first-order optics. Secondly, the irradiance constraints was derived by optical conservation theorem. Through the relationship between the subject and the detector, we could estimate the limitation of working distance when the camera lens and CCD sensor were known. The working distance is set to 3m in our system with pupil diameter 86mm and CCD irradiance 0.3mW/cm2. Finally, We employed a hybrid scheme combining eye tracking with pan and tilt system, wavefront coding technology, filter optimization and post signal recognition to implement a robust iris recognition system in dynamic operation. The blurred image was restored to ensure recognition accuracy over 3m working distance with 400mm focal length and aperture F/6.3 optics. The simulation result as well as experiment validates the proposed code

  16. DNA copy number changes define spatial patterns of heterogeneity in colorectal cancer

    PubMed Central

    Mamlouk, Soulafa; Childs, Liam Harold; Aust, Daniela; Heim, Daniel; Melching, Friederike; Oliveira, Cristiano; Wolf, Thomas; Durek, Pawel; Schumacher, Dirk; Bläker, Hendrik; von Winterfeld, Moritz; Gastl, Bastian; Möhr, Kerstin; Menne, Andrea; Zeugner, Silke; Redmer, Torben; Lenze, Dido; Tierling, Sascha; Möbs, Markus; Weichert, Wilko; Folprecht, Gunnar; Blanc, Eric; Beule, Dieter; Schäfer, Reinhold; Morkel, Markus; Klauschen, Frederick; Leser, Ulf; Sers, Christine

    2017-01-01

    Genetic heterogeneity between and within tumours is a major factor determining cancer progression and therapy response. Here we examined DNA sequence and DNA copy-number heterogeneity in colorectal cancer (CRC) by targeted high-depth sequencing of 100 most frequently altered genes. In 97 samples, with primary tumours and matched metastases from 27 patients, we observe inter-tumour concordance for coding mutations; in contrast, gene copy numbers are highly discordant between primary tumours and metastases as validated by fluorescent in situ hybridization. To further investigate intra-tumour heterogeneity, we dissected a single tumour into 68 spatially defined samples and sequenced them separately. We identify evenly distributed coding mutations in APC and TP53 in all tumour areas, yet highly variable gene copy numbers in numerous genes. 3D morpho-molecular reconstruction reveals two clusters with divergent copy number aberrations along the proximal–distal axis indicating that DNA copy number variations are a major source of tumour heterogeneity in CRC. PMID:28120820

  17. 6 CFR 5.7 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Classified information. 5.7 Section 5.7 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY DISCLOSURE OF RECORDS AND INFORMATION Freedom of Information Act § 5.7 Classified information. In processing a request for information that is...

  18. 28 CFR 700.14 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified information. 700.14 Section... INFORMATION OF THE OFFICE OF INDEPENDENT COUNSEL Protection of Privacy and Access to Individual Records Under the Privacy Act of 1974 § 700.14 Classified information. In processing a request for access to a...

  19. DNA Mapping Made Simple: An Intellectual Activity about the Genetic Modification of Organisms

    ERIC Educational Resources Information Center

    Marques, Miguel; Arrabaca, Joao; Chagas, Isabel

    2004-01-01

    Since the discovery of the DNA double helix (in 1953 by Watson and Crick), technologies have been developed that allow scientists to manipulate the genome of bacteria to produce human hormones, as well as the genome of crop plants to achieve high yield and enhanced flavor. The universality of the genetic code has allowed DNA isolated from a…

  20. On the decoding process in ternary error-correcting output codes.

    PubMed

    Escalera, Sergio; Pujol, Oriol; Radeva, Petia

    2010-01-01

    A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-Correcting Output Codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a "do not care" symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI Machine Learning Repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.

  1. 49 CFR 1.65 - Authority to classify information.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... originally classify information as Top Secret.) (b) The following delegations of this authority, which may... Transportation authority to originally classify information as Secret and Confidential with further authorization... National Security Plans (Confidential only). (c) Authority to originally classify information as Secret or...

  2. Evolutionary Conservation of a Coding Function for D4Z4, the Tandem DNA Repeat Mutated in Facioscapulohumeral Muscular Dystrophy

    PubMed Central

    Clapp, Jannine ; Mitchell, Laura M. ; Bolland, Daniel J. ; Fantes, Judy ; Corcoran, Anne E. ; Scotting, Paul J. ; Armour, John A. L. ; Hewitt, Jane E. 

    2007-01-01

    Facioscapulohumeral muscular dystrophy (FSHD) is caused by deletions within the polymorphic DNA tandem array D4Z4. Each D4Z4 repeat unit has an open reading frame (ORF), termed “DUX4,” containing two homeobox sequences. Because there has been no evidence of a transcript from the array, these deletions are thought to cause FSHD by a position effect on other genes. Here, we identify D4Z4 homologues in the genomes of rodents, Afrotheria (superorder of elephants and related species), and other species and show that the DUX4 ORF is conserved. Phylogenetic analysis suggests that primate and Afrotherian D4Z4 arrays are orthologous and originated from a retrotransposed copy of an intron-containing DUX gene, DUXC. Reverse-transcriptase polymerase chain reaction and RNA fluorescence and tissue in situ hybridization data indicate transcription of the mouse array. Together with the conservation of the DUX4 ORF for >100 million years, this strongly supports a coding function for D4Z4 and necessitates re-examination of current models of the FSHD disease mechanism. PMID:17668377

  3. DNA Polymerase κ Is a Key Cellular Factor for the Formation of Covalently Closed Circular DNA of Hepatitis B Virus

    PubMed Central

    Qi, Yonghe; Gao, Zhenchao; Peng, Bo; Yan, Huan; Tang, Dingbin; Song, Zilin; He, Wenhui; Sun, Yinyan; Guo, Ju-Tao; Li, Wenhui

    2016-01-01

    Hepatitis B virus (HBV) infection of hepatocytes begins by binding to its cellular receptor sodium taurocholate cotransporting polypeptide (NTCP), followed by the internalization of viral nucleocapsid into the cytoplasm. The viral relaxed circular (rc) DNA genome in nucleocapsid is transported into the nucleus and converted into covalently closed circular (ccc) DNA to serve as a viral persistence reservoir that is refractory to current antiviral therapies. Host DNA repair enzymes have been speculated to catalyze the conversion of rcDNA to cccDNA, however, the DNA polymerase(s) that fills the gap in the plus strand of rcDNA remains to be determined. Here we conducted targeted genetic screening in combination with chemical inhibition to identify the cellular DNA polymerase(s) responsible for cccDNA formation, and exploited recombinant HBV with capsid coding deficiency which infects HepG2-NTCP cells with similar efficiency of wild-type HBV to assure cccDNA synthesis is exclusively from de novo HBV infection. We found that DNA polymerase κ (POLK), a Y-family DNA polymerase with maximum activity in non-dividing cells, substantially contributes to cccDNA formation during de novo HBV infection. Depleting gene expression of POLK in HepG2-NTCP cells by either siRNA knockdown or CRISPR/Cas9 knockout inhibited the conversion of rcDNA into cccDNA, while the diminished cccDNA formation in, and hence the viral infection of, the knockout cells could be effectively rescued by ectopic expression of POLK. These studies revealed that POLK is a crucial host factor required for cccDNA formation during a de novo HBV infection and suggest that POLK may be a potential target for developing antivirals against HBV. PMID:27783675

  4. Biomimetic Artificial Epigenetic Code for Targeted Acetylation of Histones.

    PubMed

    Taniguchi, Junichi; Feng, Yihong; Pandian, Ganesh N; Hashiya, Fumitaka; Hidaka, Takuya; Hashiya, Kaori; Park, Soyoung; Bando, Toshikazu; Ito, Shinji; Sugiyama, Hiroshi

    2018-06-13

    While the central role of locus-specific acetylation of histone proteins in eukaryotic gene expression is well established, the availability of designer tools to regulate acetylation at particular nucleosome sites remains limited. Here, we develop a unique strategy to introduce acetylation by constructing a bifunctional molecule designated Bi-PIP. Bi-PIP has a P300/CBP-selective bromodomain inhibitor (Bi) as a P300/CBP recruiter and a pyrrole-imidazole polyamide (PIP) as a sequence-selective DNA binder. Biochemical assays verified that Bi-PIPs recruit P300 to the nucleosomes having their target DNA sequences and extensively accelerate acetylation. Bi-PIPs also activated transcription of genes that have corresponding cognate DNA sequences inside living cells. Our results demonstrate that Bi-PIPs could act as a synthetic programmable histone code of acetylation, which emulates the bromodomain-mediated natural propagation system of histone acetylation to activate gene expression in a sequence-selective manner.

  5. I-Ching, dyadic groups of binary numbers and the geno-logic coding in living bodies.

    PubMed

    Hu, Zhengbing; Petoukhov, Sergey V; Petukhova, Elena S

    2017-12-01

    The ancient Chinese book I-Ching was written a few thousand years ago. It introduces the system of symbols Yin and Yang (equivalents of 0 and 1). It had a powerful impact on culture, medicine and science of ancient China and several other countries. From the modern standpoint, I-Ching declares the importance of dyadic groups of binary numbers for the Nature. The system of I-Ching is represented by the tables with dyadic groups of 4 bigrams, 8 trigrams and 64 hexagrams, which were declared as fundamental archetypes of the Nature. The ancient Chinese did not know about the genetic code of protein sequences of amino acids but this code is organized in accordance with the I-Ching: in particularly, the genetic code is constructed on DNA molecules using 4 nitrogenous bases, 16 doublets, and 64 triplets. The article also describes the usage of dyadic groups as a foundation of the bio-mathematical doctrine of the geno-logic code, which exists in parallel with the known genetic code of amino acids but serves for a different goal: to code the inherited algorithmic processes using the logical holography and the spectral logic of systems of genetic Boolean functions. Some relations of this doctrine with the I-Ching are discussed. In addition, the ratios of musical harmony that can be revealed in the parameters of DNA structure are also represented in the I-Ching book. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. DNA topoisomerase 1α promotes transcriptional silencing of transposable elements through DNA methylation and histone lysine 9 dimethylation in Arabidopsis.

    PubMed

    Dinh, Thanh Theresa; Gao, Lei; Liu, Xigang; Li, Dongming; Li, Shengben; Zhao, Yuanyuan; O'Leary, Michael; Le, Brandon; Schmitz, Robert J; Manavella, Pablo A; Manavella, Pablo; Li, Shaofang; Weigel, Detlef; Pontes, Olga; Ecker, Joseph R; Chen, Xuemei

    2014-07-01

    RNA-directed DNA methylation (RdDM) and histone H3 lysine 9 dimethylation (H3K9me2) are related transcriptional silencing mechanisms that target transposable elements (TEs) and repeats to maintain genome stability in plants. RdDM is mediated by small and long noncoding RNAs produced by the plant-specific RNA polymerases Pol IV and Pol V, respectively. Through a chemical genetics screen with a luciferase-based DNA methylation reporter, LUCL, we found that camptothecin, a compound with anti-cancer properties that targets DNA topoisomerase 1α (TOP1α) was able to de-repress LUCL by reducing its DNA methylation and H3K9me2 levels. Further studies with Arabidopsis top1α mutants showed that TOP1α silences endogenous RdDM loci by facilitating the production of Pol V-dependent long non-coding RNAs, AGONAUTE4 recruitment and H3K9me2 deposition at TEs and repeats. This study assigned a new role in epigenetic silencing to an enzyme that affects DNA topology.

  7. Nanoparticle based bio-bar code technology for trace analysis of aflatoxin B1 in Chinese herbs.

    PubMed

    Yu, Yu-Yan; Chen, Yuan-Yuan; Gao, Xuan; Liu, Yuan-Yuan; Zhang, Hong-Yan; Wang, Tong-Ying

    2018-04-01

    A novel and sensitive assay for aflatoxin B1 (AFB1) detection has been developed by using bio-bar code assay (BCA). The method that relies on polyclonal antibodies encoded with DNA modified gold nanoparticle (NP) and monoclonal antibodies modified magnetic microparticle (MMP), and subsequent detection of amplified target in the form of bio-bar code using a fluorescent quantitative polymerase chain reaction (FQ-PCR) detection method. First, NP probes encoded with DNA that was unique to AFB1, MMP probes with monoclonal antibodies that bind AFB1 specifically were prepared. Then, the MMP-AFB1-NP sandwich compounds were acquired, dehybridization of the oligonucleotides on the nanoparticle surface allows the determination of the presence of AFB1 by identifying the oligonucleotide sequence released from the NP through FQ-PCR detection. The bio-bar code techniques system for detecting AFB1 was established, and the sensitivity limit was about 10 -8  ng/mL, comparable ELISA assays for detecting the same target, it showed that we can detect AFB1 at low attomolar levels with the bio-bar-code amplification approach. This is also the first demonstration of a bio-bar code type assay for the detection of AFB1 in Chinese herbs. Copyright © 2017. Published by Elsevier B.V.

  8. Generating code adapted for interlinking legacy scalar code and extended vector code

    DOEpatents

    Gschwind, Michael K

    2013-06-04

    Mechanisms for intermixing code are provided. Source code is received for compilation using an extended Application Binary Interface (ABI) that extends a legacy ABI and uses a different register configuration than the legacy ABI. First compiled code is generated based on the source code, the first compiled code comprising code for accommodating the difference in register configurations used by the extended ABI and the legacy ABI. The first compiled code and second compiled code are intermixed to generate intermixed code, the second compiled code being compiled code that uses the legacy ABI. The intermixed code comprises at least one call instruction that is one of a call from the first compiled code to the second compiled code or a call from the second compiled code to the first compiled code. The code for accommodating the difference in register configurations is associated with the at least one call instruction.

  9. Identification of Intermediate in Hepatitis B Virus CCC DNA Formation and Sensitive and Selective CCC DNA Detection.

    PubMed

    Luo, Jun; Cui, Xiuji; Gao, Lu; Hu, Jianming

    2017-06-21

    The hepatitis B virus (HBV) covalently closed circular (CCC) DNA functions as the only viral template capable of coding for all the viral RNA species and is thus essential to initiate and sustain viral replication. CCC DNA is converted, in a multi-step and ill-understood process, from a relaxed circular (RC) DNA, in which neither of the two DNA strands is covalently closed. To detect putative intermediates during RC to CCC DNA conversion, two 3' exonucleases Exo I and Exo III, in combination were used to degrade all DNA strands with a free 3' end, which would nevertheless preserve closed circular DNA, either single-stranded (SS) or double-stranded (DS). Indeed, a RC DNA species with a covalently closed minus strand but an open plus strand (closed minus-strand RC DNA or cM-RC DNA) was detected by this approach. Further analyses indicated that at least some of the plus strands in such a putative intermediate likely still retained the RNA primer that is attached to the 5' end of the plus strand in RC DNA, suggesting that minus strand closing can occur before plus strand processing. Furthermore, the same nuclease treatment proved to be useful for sensitive and specific detection of CCC DNA by removing all DNA species other than closed circular DNA. Application of these and similar approaches may allow the identification of additional intermediates during CCC DNA formation and facilitate specific and sensitive detection of CCC DNA, which should help elucidate the pathways of CCC DNA formation and factors involved. IMPORTANCE The hepatitis B virus (HBV) covalently closed circular (CCC) DNA is the molecular basis of viral persistence, by serving as the viral transcriptional template. CCC DNA is converted, in a multi-step and ill-understood process, from a relaxed circular (RC) DNA. Little is currently understood about the pathways or factors involved in CCC DNA formation. We have now detected a likely intermediate during the conversion of RC to CCC DNA, thus providing

  10. Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer.

    PubMed

    Goremykin, Vadim V; Salamini, Francesco; Velasco, Riccardo; Viola, Roberto

    2009-01-01

    The mitochondrial genome of grape (Vitis vinifera), the largest organelle genome sequenced so far, is presented. The genome is 773,279 nt long and has the highest coding capacity among known angiosperm mitochondrial DNAs (mtDNAs). The proportion of promiscuous DNA of plastid origin in the genome is also the largest ever reported for an angiosperm mtDNA, both in absolute and relative terms. In all, 42.4% of chloroplast genome of Vitis has been incorporated into its mitochondrial genome. In order to test if horizontal gene transfer (HGT) has also contributed to the gene content of the grape mtDNA, we built phylogenetic trees with the coding sequences of mitochondrial genes of grape and their homologs from plant mitochondrial genomes. Many incongruent gene tree topologies were obtained. However, the extent of incongruence between these gene trees is not significantly greater than that observed among optimal trees for chloroplast genes, the common ancestry of which has never been in doubt. In both cases, we attribute this incongruence to artifacts of tree reconstruction, insufficient numbers of characters, and gene paralogy. This finding leads us to question the recent phylogenetic interpretation of Bergthorsson et al. (2003, 2004) and Richardson and Palmer (2007) that rampant HGT into the mtDNA of Amborella best explains phylogenetic incongruence between mitochondrial gene trees for angiosperms. The only evidence for HGT into the Vitis mtDNA found involves fragments of two coding sequences stemming from two closteroviruses that cause the leaf roll disease of this plant. We also report that analysis of sequences shared by both chloroplast and mitochondrial genomes provides evidence for a previously unknown gene transfer route from the mitochondrion to the chloroplast.

  11. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    PubMed

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as

  12. Cryptic splice site in the complementary DNA of glucocerebrosidase causes inefficient expression.

    PubMed

    Bukovac, Scott W; Bagshaw, Richard D; Rigat, Brigitte A; Callahan, John W; Clarke, Joe T R; Mahuran, Don J

    2008-10-15

    The low levels of human lysosomal glucocerebrosidase activity expressed in transiently transfected Chinese hamster ovary (CHO) cells were investigated. Reverse transcription PCR (RT-PCR) demonstrated that a significant portion of the transcribed RNA was misspliced owing to the presence of a cryptic splice site in the complementary DNA (cDNA). Missplicing results in the deletion of 179 bp of coding sequence and a premature stop codon. A repaired cDNA was constructed abolishing the splice site without changing the amino acid sequence. The level of glucocerebrosidase expression was increased sixfold. These data demonstrate that for maximum expression of any cDNA construct, the transcription products should be examined.

  13. TLR9 agonists oppositely modulate DNA repair genes in tumor versus immune cells and enhance chemotherapy effects.

    PubMed

    Sommariva, Michele; De Cecco, Loris; De Cesare, Michelandrea; Sfondrini, Lucia; Ménard, Sylvie; Melani, Cecilia; Delia, Domenico; Zaffaroni, Nadia; Pratesi, Graziella; Uva, Valentina; Tagliabue, Elda; Balsari, Andrea

    2011-10-15

    Synthetic oligodeoxynucleotides expressing CpG motifs (CpG-ODN) are a Toll-like receptor 9 (TLR9) agonist that can enhance the antitumor activity of DNA-damaging chemotherapy and radiation therapy in preclinical mouse models. We hypothesized that the success of these combinations is related to the ability of CpG-ODN to modulate genes involved in DNA repair. We conducted an in silico analysis of genes implicated in DNA repair in data sets obtained from murine colon carcinoma cells in mice injected intratumorally with CpG-ODN and from splenocytes in mice treated intraperitoneally with CpG-ODN. CpG-ODN treatment caused downregulation of DNA repair genes in tumors. Microarray analyses of human IGROV-1 ovarian carcinoma xenografts in mice treated intraperitoneally with CpG-ODN confirmed in silico findings. When combined with the DNA-damaging drug cisplatin, CpG-ODN significantly increased the life span of mice compared with individual treatments. In contrast, CpG-ODN led to an upregulation of genes involved in DNA repair in immune cells. Cisplatin-treated patients with ovarian carcinoma as well as anthracycline-treated patients with breast cancer who are classified as "CpG-like" for the level of expression of CpG-ODN modulated DNA repair genes have a better outcome than patients classified as "CpG-untreated-like," indicating the relevance of these genes in the tumor cell response to DNA-damaging drugs. Taken together, the findings provide evidence that the tumor microenvironment can sensitize cancer cells to DNA-damaging chemotherapy, thereby expanding the benefits of CpG-ODN therapy beyond induction of a strong immune response.

  14. Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.

    PubMed

    Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C

    2018-06-01

    High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding

  15. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  16. Phylogenetic study on Shiraia bambusicola by rDNA sequence analyses.

    PubMed

    Cheng, Tian-Fan; Jia, Xiao-Ming; Ma, Xiao-Hang; Lin, Hai-Ping; Zhao, Yu-Hua

    2004-01-01

    In this study, 18S rDNA and ITS-5.8S rDNA regions of four Shiraia bambusicola isolates collected from different species of bamboos were amplified by PCR with universal primer pairs NS1/NS8 and ITS5/ITS4, respectively, and sequenced. Phylogenetic analyses were conducted on three selected datasets of rDNA sequences. Maximum parsimony, distance and maximum likelihood criteria were used to infer trees. Morphological characteristics were also observed. The positioning of Shiraia in the order Pleosporales was well supported by bootstrap, which agreed with the placement by Amano (1980) according to their morphology. We did not find significant inter-hostal differences among these four isolates from different species of bamboos. From the results of analyses and comparison of their rDNA sequences, we conclude that Shiraia should be classified into Pleosporales as Amano (1980) proposed and suggest that it might be positioned in the family Phaeosphaeriaceae. Copyright 2004 WILEY-VCH Verlag GmbH & Co.

  17. New Modeling Approaches to Study DNA Damage by the Direct and Indirect Effects of Ionizing Radiation

    NASA Technical Reports Server (NTRS)

    Plante, Ianik; Cucinotta, Francis A.

    2012-01-01

    DNA is damaged both by the direct and indirect effects of radiation. In the direct effect, the DNA itself is ionized, whereas the indirect effect involves the radiolysis of the water molecules surrounding the DNA and the subsequent reaction of the DNA with radical products. While this problem has been studied for many years, many unknowns still exist. To study this problem, we have developed the computer code RITRACKS [1], which simulates the radiation track structure for heavy ions and electrons, calculating all energy deposition events and the coordinates of all species produced by the water radiolysis. In this work, we plan to simulate DNA damage by using the crystal structure of a nucleosome and calculations performed by RITRACKS. The energy deposition events are used to calculate the dose deposited in nanovolumes [2] and therefore can be used to simulate the direct effect of the radiation. Using the positions of the radiolytic species with a radiation chemistry code [3] it will be possible to simulate DNA damage by indirect effect. The simulation results can be compared with results from previous calculations such as the frequencies of simple and complex strand breaks [4] and with newer experimental data using surrogate markers of DNA double ]strand breaks such as . ]H2AX foci [5].

  18. The Challenges of Identifying and Classifying Child Sexual Abuse Material.

    PubMed

    Kloess, Juliane A; Woodhams, Jessica; Whittle, Helen; Grant, Tim; Hamilton-Giachritsis, Catherine E

    2018-02-01

    The aim of the present study was to (a) assess the reliability with which indecent images of children (IIOC) are classified as being of an indecent versus nonindecent nature, and (b) examine in detail the decision-making process engaged in by law enforcement personnel who undertake the difficult task of identifying and classifying IIOC as per the current legislative offense categories. One experienced researcher and four employees from a police force in the United Kingdom coded an extensive amount of IIOC ( n = 1,212-2,233) to determine if they (a) were deemed to be of an indecent nature, and (b) depicted a child. Interrater reliability analyses revealed both considerable agreement and disagreement across coders, which were followed up with two focus groups involving the four employees. The first entailed a general discussion of the aspects that made such material more or less difficult to identify; the second focused around images where there had been either agreement ( n = 20) or disagreement ( n = 36) across coders that the images were of an indecent nature. Using thematic analysis, a number of factors apparent within IIOC were revealed to make the determination of youthfulness and indecency significantly more challenging for coders, with most relating to the developmental stage of the victim and the ambiguity of the context of an image. Findings are discussed in light of their implications for the identification of victims of ongoing sexual exploitation/abuse, the assessment and treatment of individuals in possession of IIOC, as well as the practice of policing and sentencing this type of offending behavior.

  19. iDBPs: a web server for the identification of DNA binding proteins

    PubMed Central

    Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2010-01-01

    Summary: The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies. Availability: http://idbps.tau.ac.il/ Contact: NirB@tauex.tau.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20089514

  20. Prognostic significance of DNA ploidy in adenocarcinoma of the pancreas. A flow cytometric study of paraffin-embedded specimens.

    PubMed

    Porschen, R; Remy, U; Bevers, G; Schauseil, S; Hengels, K J; Borchard, F

    1993-06-15

    The prognostic significance of tumor DNA ploidy in patients with cancer of the pancreas has not been defined because conflicting results have been reported. DNA content was measured in 56 ductal adenocarcinomas of the pancreas. DNA ploidy status was evaluated by flow cytometry in nuclei isolated from paraffin-embedded tumor tissues. An abnormal DNA stemline was observed in 27 (48%) patients. The percentage of aneuploid tumors was significantly increased in tumors classified as Stage III/IV (53%) compared with those classified as Stage I (22%). A borderline significant association existed between DNA ploidy and radicality of surgery (P = 0.08). The median survival of patients with diploid carcinomas was 6.9 months (standard error, +/- 0.9) in comparison to 4.5 +/- 1.2 months for patients with aneuploid tumors (P = 0.013 by generalized Wilcoxon test; P = 0.023 by generalized Savage test). Although a selection bias cannot be excluded, survival of patients with a radical resection was longer than that of patients with a nonradical resection (P = 0.0008 and P = 0.0085, respectively). In addition, presence of distant metastasis (P = 0.0006 [Wilcoxon test] and P = 0.033 [Savage test]) could be identified as a prognostic factor. In a Cox regression model, results of surgery and DNA ploidy were independent prognostic variables. Because DNA ploidy has a significant impact on prognosis in pancreatic cancer, it should be used as a variable for stratified randomization of patients in therapeutic trials.

  1. Research on Image Encryption Based on DNA Sequence and Chaos Theory

    NASA Astrophysics Data System (ADS)

    Tian Zhang, Tian; Yan, Shan Jun; Gu, Cheng Yan; Ren, Ran; Liao, Kai Xin

    2018-04-01

    Nowadays encryption is a common technique to protect image data from unauthorized access. In recent years, many scientists have proposed various encryption algorithms based on DNA sequence to provide a new idea for the design of image encryption algorithm. Therefore, a new method of image encryption based on DNA computing technology is proposed in this paper, whose original image is encrypted by DNA coding and 1-D logistic chaotic mapping. First, the algorithm uses two modules as the encryption key. The first module uses the real DNA sequence, and the second module is made by one-dimensional logistic chaos mapping. Secondly, the algorithm uses DNA complementary rules to encode original image, and uses the key and DNA computing technology to compute each pixel value of the original image, so as to realize the encryption of the whole image. Simulation results show that the algorithm has good encryption effect and security.

  2. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

    PubMed

    Xie, Guosen; Mo, Zhongxi

    2011-01-21

    In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species. Copyright © 2010 Elsevier Ltd. All rights reserved.

  3. Unitary circular code motifs in genomes of eukaryotes.

    PubMed

    El Soufi, Karim; Michel, Christian J

    A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. The origin of this circular code X in genes is an open problem since its discovery in 1996. Here, we first show that the unitary circular codes (UCC), i.e. sets of one word, allow to generate unitary circular code motifs (UCC motifs), i.e. a concatenation of the same motif (simple repeats) leading to low complexity DNA. Three classes of UCC motifs are studied here: repeated dinucleotides (D + motifs), repeated trinucleotides (T + motifs) and repeated tetranucleotides (T + motifs). Thus, the D + , T + and T + motifs allow to retrieve, synchronize and maintain a frame modulo 2, modulo 3 and modulo 4, respectively, and their shifted frames (1 modulo 2; 1 and 2 modulo 3; 1, 2 and 3 modulo 4 according to the C 2 , C 3 and C 4 properties, respectively) in the DNA sequences. The statistical distribution of the D + , T + and T + motifs is analyzed in the genomes of eukaryotes. A UCC motif and its comp lementary UCC motif have the same distribution in the eukaryotic genomes. Furthermore, a UCC motif and its complementary UCC motif have increasing occurrences contrary to their number of hydrogen bonds, very significant with the T + motifs. The longest D + , T + and T + motifs in the studied eukaryotic genomes are also given. Surprisingly, a scarcity of repeated trinucleotides (T + motifs) in the large eukaryotic genomes is observed compared to the D + and T + motifs. This result has been investigated and may be explained by two outcomes. Repeated trinucleotides (T + motifs) are identified

  4. Analysis of developmental gene conservation in the Actinomycetales using DNA/DNA microarray comparisons.

    PubMed

    Kirby, Ralph; Herron, Paul; Hoskisson, Paul

    2011-02-01

    Based on available genome sequences, Actinomycetales show significant gene synteny across a wide range of species and genera. In addition, many genera show varying degrees of complex morphological development. Using the presence of gene synteny as a basis, it is clear that an analysis of gene conservation across the Streptomyces and various other Actinomycetales will provide information on both the importance of genes and gene clusters and the evolution of morphogenesis in these bacteria. Genome sequencing, although becoming cheaper, is still relatively expensive for comparing large numbers of strains. Thus, a heterologous DNA/DNA microarray hybridization dataset based on a Streptomyces coelicolor microarray allows a cheaper and greater depth of analysis of gene conservation. This study, using both bioinformatical and microarray approaches, was able to classify genes previously identified as involved in morphogenesis in Streptomyces into various subgroups in terms of conservation across species and genera. This will allow the targeting of genes for further study based on their importance at the species level and at higher evolutionary levels.

  5. [Structural organization of 5S ribosomal DNA of Rosa rugosa].

    PubMed

    Tynkevych, Iu O; Volkov, R A

    2014-01-01

    In order to clarify molecular organization of the genomic region encoding 5S rRNA in diploid species Rosa rugosa several 5S rDNA repeated units were cloned and sequenced. Analysis of the obtained sequences revealed that only one length variant of 5S rDNA repeated units, which contains intact promoter elements in the intergenic spacer region (IGS) and appears to be transcriptionally active is present in the genome. Additionally, a limited number of 5S rDNA pseudogenes lacking a portion of coding sequence and the complete IGS was detected. A high level of sequence similarity (from 93.7 to 97.5%) between the IGS of major 5S rDNA variants of East Asian R. rugosa and North American R. nitida was found indicating comparatively recent divergence of these species.

  6. Fusion and Gaussian mixture based classifiers for SONAR data

    NASA Astrophysics Data System (ADS)

    Kotari, Vikas; Chang, KC

    2011-06-01

    Underwater mines are inexpensive and highly effective weapons. They are difficult to detect and classify. Hence detection and classification of underwater mines is essential for the safety of naval vessels. This necessitates a formulation of highly efficient classifiers and detection techniques. Current techniques primarily focus on signals from one source. Data fusion is known to increase the accuracy of detection and classification. In this paper, we formulated a fusion-based classifier and a Gaussian mixture model (GMM) based classifier for classification of underwater mines. The emphasis has been on sound navigation and ranging (SONAR) signals due to their extensive use in current naval operations. The classifiers have been tested on real SONAR data obtained from University of California Irvine (UCI) repository. The performance of both GMM based classifier and fusion based classifier clearly demonstrate their superior classification accuracy over conventional single source cases and validate our approach.

  7. Fluctuations in the DNA double helix

    NASA Astrophysics Data System (ADS)

    Peyrard, M.; López, S. C.; Angelov, D.

    2007-08-01

    DNA is not the static entity suggested by the famous double helix structure. It shows large fluctuational openings, in which the bases, which contain the genetic code, are temporarily open. Therefore it is an interesting system to study the effect of nonlinearity on the physical properties of a system. A simple model for DNA, at a mesoscopic scale, can be investigated by computer simulation, in the same spirit as the original work of Fermi, Pasta and Ulam. These calculations raise fundamental questions in statistical physics because they show a temporary breaking of equipartition of energy, regions with large amplitude fluctuations being able to coexist with regions where the fluctuations are very small, even when the model is studied in the canonical ensemble. This phenomenon can be related to nonlinear excitations in the model. The ability of the model to describe the actual properties of DNA is discussed by comparing theoretical and experimental results for the probability that base pairs open an a given temperature in specific DNA sequences. These studies give us indications on the proper description of the effect of the sequence in the mesoscopic model.

  8. Background-Modeling-Based Adaptive Prediction for Surveillance Video Coding.

    PubMed

    Zhang, Xianguo; Huang, Tiejun; Tian, Yonghong; Gao, Wen

    2014-02-01

    The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.

  9. An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records

    PubMed Central

    Kavuluru, Ramakanth; Rios, Anthony; Lu, Yuan

    2015-01-01

    Background Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. Objective We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. Methods We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011–2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. Results Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. Conclusions We show that datasets at different scale

  10. An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

    PubMed

    Kavuluru, Ramakanth; Rios, Anthony; Lu, Yuan

    2015-10-01

    Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011-2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. We show that datasets at different scale (size of the EMRs, number of distinct codes) and

  11. Probe classification of on-off type DNA microarray images with a nonlinear matching measure

    NASA Astrophysics Data System (ADS)

    Ryu, Munho; Kim, Jong Dae; Min, Byoung Goo; Kim, Jongwon; Kim, Y. Y.

    2006-01-01

    We propose a nonlinear matching measure, called counting measure, as a signal detection measure that is defined as the number of on pixels in the spot area. It is applied to classify probes for an on-off type DNA microarray, where each probe spot is classified as hybridized or not. The counting measure also incorporates the maximum response search method, where the expected signal is obtained by taking the maximum among the measured responses of the various positions and sizes of the spot template. The counting measure was compared to existing signal detection measures such as the normalized covariance and the median for 2390 patient samples tested on the human papillomavirus (HPV) DNA chip. The counting measure performed the best regardless of whether or not the maximum response search method was used. The experimental results showed that the counting measure combined with the positional search was the most preferable.

  12. Ultra-low background DNA cloning system.

    PubMed

    Goto, Kenta; Nagano, Yukio

    2013-01-01

    Yeast-based in vivo cloning is useful for cloning DNA fragments into plasmid vectors and is based on the ability of yeast to recombine the DNA fragments by homologous recombination. Although this method is efficient, it produces some by-products. We have developed an "ultra-low background DNA cloning system" on the basis of yeast-based in vivo cloning, by almost completely eliminating the generation of by-products and applying the method to commonly used Escherichia coli vectors, particularly those lacking yeast replication origins and carrying an ampicillin resistance gene (Amp(r)). First, we constructed a conversion cassette containing the DNA sequences in the following order: an Amp(r) 5' UTR (untranslated region) and coding region, an autonomous replication sequence and a centromere sequence from yeast, a TRP1 yeast selectable marker, and an Amp(r) 3' UTR. This cassette allowed conversion of the Amp(r)-containing vector into the yeast/E. coli shuttle vector through use of the Amp(r) sequence by homologous recombination. Furthermore, simultaneous transformation of the desired DNA fragment into yeast allowed cloning of this DNA fragment into the same vector. We rescued the plasmid vectors from all yeast transformants, and by-products containing the E. coli replication origin disappeared. Next, the rescued vectors were transformed into E. coli and the by-products containing the yeast replication origin disappeared. Thus, our method used yeast- and E. coli-specific "origins of replication" to eliminate the generation of by-products. Finally, we successfully cloned the DNA fragment into the vector with almost 100% efficiency.

  13. A spectral-structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Zhao, Bei; Zhong, Yanfei; Zhang, Liangpei

    2016-06-01

    results show that the whole image as the scope of the pooling operator performs better than the scope generated by SP. In addition, SSBFC codes and pools the spectral and structural features separately to avoid mutual interruption between the spectral and structural features. The coding vectors of spectral and structural features are then concatenated into a final coding vector. Finally, SSBFC classifies the final coding vector by support vector machine (SVM) with a histogram intersection kernel (HIK). Compared with the latest scene classification methods, the experimental results with three VHSR datasets demonstrate that the proposed SSBFC performs better than the other classification methods for VHSR image scenes.

  14. Novel layered clustering-based approach for generating ensemble of classifiers.

    PubMed

    Rahman, Ashfaqur; Verma, Brijesh

    2011-05-01

    This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.

  15. Germ-line and somatic EPHA2 coding variants in lens aging and cataract.

    PubMed

    Bennett, Thomas M; M'Hamdi, Oussama; Hejtmancik, J Fielding; Shiels, Alan

    2017-01-01

    Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive.

  16. Germ-line and somatic EPHA2 coding variants in lens aging and cataract

    PubMed Central

    Bennett, Thomas M.; M’Hamdi, Oussama; Hejtmancik, J. Fielding

    2017-01-01

    Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive. PMID:29267365

  17. 5 CFR 1312.5 - Authority to classify.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ..., DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and Declassification of National Security Information § 1312.5 Authority to classify. (a) The authority to originally classify information or material under this part shall be limited to those officials concerned with matters...

  18. Artificial neural network classifier predicts neuroblastoma patients' outcome.

    PubMed

    Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi

    2016-11-08

    More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile

  19. Local-global classifier fusion for screening chest radiographs

    NASA Astrophysics Data System (ADS)

    Ding, Meng; Antani, Sameer; Jaeger, Stefan; Xue, Zhiyun; Candemir, Sema; Kohli, Marc; Thoma, George

    2017-03-01

    Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is critical for population screening, especially in medical resource constrained developing regions. In this article, we describe steps that improve previously reported performance of NLM's CXR screening algorithms and help advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary classification systems are combined. The local classifier focuses on subtle and partial presentation of the disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition, the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve, sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global fusion method over any single classifier.

  20. Leveraging Wikipedia knowledge to classify multilingual biomedical documents.

    PubMed

    Antonio Mouriño García, Marcos; Pérez Rodríguez, Roberto; Anido Rifón, Luis

    2018-05-02

    This article presents a classifier that leverages Wikipedia knowledge to represent documents as vectors of concepts weights, and analyses its suitability for classifying biomedical documents written in any language when it is trained only with English documents. We propose the cross-language concept matching technique, which relies on Wikipedia interlanguage links to convert concept vectors between languages. The performance of the classifier is compared to a classifier based on machine translation, and two classifiers based on MetaMap. To perform the experiments, we created two multilingual corpus. The first one, Multi-Lingual UVigoMED (ML-UVigoMED) is composed of 23,647 Wikipedia documents about biomedical topics written in English, German, French, Spanish, Italian, Galician, Romanian, and Icelandic. The second one, English-French-Spanish-German UVigoMED (EFSG-UVigoMED) is composed of 19,210 biomedical abstract extracted from MEDLINE written in English, French, Spanish, and German. The performance of the approach proposed is superior to any of the state-of-the art classifier in the benchmark. We conclude that leveraging Wikipedia knowledge is of great advantage in tasks of multilingual classification of biomedical documents. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Generating compact classifier systems using a simple artificial immune system.

    PubMed

    Leung, Kevin; Cheong, France; Cheong, Christopher

    2007-10-01

    Current artificial immune system (AIS) classifiers have two major problems: 1) their populations of B-cells can grow to huge proportions, and 2) optimizing one B-cell (part of the classifier) at a time does not necessarily guarantee that the B-cell pool (the whole classifier) will be optimized. In this paper, the design of a new AIS algorithm and classifier system called simple AIS is described. It is different from traditional AIS classifiers in that it takes only one B-cell, instead of a B-cell pool, to represent the classifier. This approach ensures global optimization of the whole system, and in addition, no population control mechanism is needed. The classifier was tested on seven benchmark data sets using different classification techniques and was found to be very competitive when compared to other classifiers.

  2. Acinetobacter phage genome is similar to Sphinx 2.36, the circular DNA copurified with TSE infected particles.

    PubMed

    Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda

    2013-01-01

    While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002.

  3. Acinetobacter phage genome is similar to Sphinx 2.36, the circular DNA copurified with TSE infected particles

    PubMed Central

    Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda

    2013-01-01

    While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002. PMID:23867905

  4. HYBRIDIZATION PROPERTIES OF DNA SEQUENCES DIRECTING THE SYNTHESIS OF MESSENGER RNA AND HETEROGENEOUS NUCLEAR RNA

    PubMed Central

    Greenberg, Jay R.; Perry, Robert P.

    1971-01-01

    The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is

  5. How-To-Do-It: Recombinant DNA Made Easy: I. "Jumping Genes."

    ERIC Educational Resources Information Center

    Thomson, Robert G.

    1988-01-01

    Presents as part I of a two-part series a study involving the intercellular transfer of bacterial DNA that codes for the resistance to antibiotics. Demonstrates to students that such transfers occur. Discusses laboratory procedures, materials and results. (CW)

  6. On Algorithms for Generating Computationally Simple Piecewise Linear Classifiers

    DTIC Science & Technology

    1989-05-01

    suffers. - Waveform classification, e.g. speech recognition, seismic analysis (i.e. discrimination between earthquakes and nuclear explosions), target...assuming Gaussian distributions (B-G) d) Bayes classifier with probability densities estimated with the k-N-N method (B- kNN ) e) The -arest neighbour...range of classifiers are chosen including a fast, easy computable and often used classifier (B-G), reliable and complex classifiers (B- kNN and NNR

  7. Mitochondrial DNA Suggests a Western Eurasian Origin for Ancient (Proto-) Bulgarians.

    PubMed

    Nesheva, D V; Karachanak-Yankova, S; Lari, M; Yordanov, Y; Galabov, A; Caramelli, D; Toncheva, D

    2015-01-01

    Ancient (proto-) Bulgarians have long been thought of as a Turkic population. However, evidence found in the past three decades shows that this is not the case. Until now, this evidence has not included ancient mitochondrial DNA (mtDNA) analysis. To fill this void, we collected human remains from the 8th to the 10th century AD located in three necropolises in Bulgaria: Nojarevo (Silistra region) and Monastery of Mostich (Shumen region), both in northeastern Bulgaria, and Tuhovishte (Satovcha region) in southwestern Bulgaria. The phylogenetic analysis of 13 ancient DNA samples (extracted from teeth) identified 12 independent haplotypes, which we further classified into mtDNA haplogroups found in present-day European and western Eurasian populations. Our results suggest a western Eurasian matrilineal origin for proto-Bulgarians, as well as a genetic similarity between proto- and modern Bulgarians. Our future work will provide additional data that will further clarify proto-Bulgarian origins, thereby adding new clues to the current understanding of European genetic evolution.

  8. Cellular phone enabled non-invasive tissue classifier.

    PubMed

    Laufer, Shlomi; Rubinsky, Boris

    2009-01-01

    Cellular phone technology is emerging as an important tool in the effort to provide advanced medical care to the majority of the world population currently without access to such care. In this study, we show that non-invasive electrical measurements and the use of classifier software can be combined with cellular phone technology to produce inexpensive tissue characterization. This concept was demonstrated by the use of a Support Vector Machine (SVM) classifier to distinguish through the cellular phone between heart and kidney tissue via the non-invasive multi-frequency electrical measurements acquired around the tissues. After the measurements were performed at a remote site, the raw data were transmitted through the cellular phone to a central computational site and the classifier was applied to the raw data. The results of the tissue analysis were returned to the remote data measurement site. The classifiers correctly determined the tissue type with a specificity of over 90%. When used for the detection of malignant tumors, classifiers can be designed to produce false positives in order to ensure that no tumors will be missed. This mode of operation has applications in remote non-invasive tissue diagnostics in situ in the body, in combination with medical imaging, as well as in remote diagnostics of biopsy samples in vitro.

  9. Cellular Phone Enabled Non-Invasive Tissue Classifier

    PubMed Central

    Laufer, Shlomi; Rubinsky, Boris

    2009-01-01

    Cellular phone technology is emerging as an important tool in the effort to provide advanced medical care to the majority of the world population currently without access to such care. In this study, we show that non-invasive electrical measurements and the use of classifier software can be combined with cellular phone technology to produce inexpensive tissue characterization. This concept was demonstrated by the use of a Support Vector Machine (SVM) classifier to distinguish through the cellular phone between heart and kidney tissue via the non-invasive multi-frequency electrical measurements acquired around the tissues. After the measurements were performed at a remote site, the raw data were transmitted through the cellular phone to a central computational site and the classifier was applied to the raw data. The results of the tissue analysis were returned to the remote data measurement site. The classifiers correctly determined the tissue type with a specificity of over 90%. When used for the detection of malignant tumors, classifiers can be designed to produce false positives in order to ensure that no tumors will be missed. This mode of operation has applications in remote non-invasive tissue diagnostics in situ in the body, in combination with medical imaging, as well as in remote diagnostics of biopsy samples in vitro. PMID:19365554

  10. The "Wow! signal" of the terrestrial genetic code

    NASA Astrophysics Data System (ADS)

    shCherbak, Vladimir I.; Makukov, Maxim A.

    2013-05-01

    It has been repeatedly proposed to expand the scope for SETI, and one of the suggested alternatives to radio is the biological media. Genomic DNA is already used on Earth to store non-biological information. Though smaller in capacity, but stronger in noise immunity is the genetic code. The code is a flexible mapping between codons and amino acids, and this flexibility allows modifying the code artificially. But once fixed, the code might stay unchanged over cosmological timescales; in fact, it is the most durable construct known. Therefore it represents an exceptionally reliable storage for an intelligent signature, if that conforms to biological and thermodynamic requirements. As the actual scenario for the origin of terrestrial life is far from being settled, the proposal that it might have been seeded intentionally cannot be ruled out. A statistically strong intelligent-like "signal" in the genetic code is then a testable consequence of such scenario. Here we show that the terrestrial code displays a thorough precision-type orderliness matching the criteria to be considered an informational signal. Simple arrangements of the code reveal an ensemble of arithmetical and ideographical patterns of the same symbolic language. Accurate and systematic, these underlying patterns appear as a product of precision logic and nontrivial computing rather than of stochastic processes (the null hypothesis that they are due to chance coupled with presumable evolutionary pathways is rejected with P-value < 10-13). The patterns are profound to the extent that the code mapping itself is uniquely deduced from their algebraic representation. The signal displays readily recognizable hallmarks of artificiality, among which are the symbol of zero, the privileged decimal syntax and semantical symmetries. Besides, extraction of the signal involves logically straightforward but abstract operations, making the patterns essentially irreducible to any natural origin. Plausible ways of

  11. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    PubMed Central

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  12. Design pattern mining using distributed learning automata and DNA sequence alignment.

    PubMed

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  13. [Long non-coding RNAs in plants].

    PubMed

    Xiaoqing, Huang; Dandan, Li; Juan, Wu

    2015-04-01

    Long non-coding RNAs (lncRNAs), which are longer than 200 nucleotides in length, widely exist in organisms and function in a variety of biological processes. Currently, most of lncRNAs found in plants are transcribed by RNA polymerase Ⅱ and mediate gene expression through multiple mechanisms, such as target mimicry, transcription interference, histone methylation and DNA methylation, and play important roles in flowering, male sterility, nutrition metabolism, biotic and abiotic stress and other biological processes as regulators in plants. In this review, we summarize the databases, prediction methods, and possible functions of plant lncRNAs discovered in recent years.

  14. Evaluation of Factors Influencing Accuracy of Principal Procedure Coding Based on ICD-9-CM: An Iranian Study

    PubMed Central

    Farzandipour, Mehrdad; Sheikhtaheri, Abbas

    2009-01-01

    To evaluate the accuracy of procedural coding and the factors that influence it, 246 records were randomly selected from four teaching hospitals in Kashan, Iran. “Recodes” were assigned blindly and then compared to the original codes. Furthermore, the coders' professional behaviors were carefully observed during the coding process. Coding errors were classified as major or minor. The relations between coding accuracy and possible effective factors were analyzed by χ2 or Fisher exact tests as well as the odds ratio (OR) and the 95 percent confidence interval for the OR. The results showed that using a tabular index for rechecking codes reduces errors (83 percent vs. 72 percent accuracy). Further, more thorough documentation by the clinician positively affected coding accuracy, though this relation was not significant. Readability of records decreased errors overall (p = .003), including major ones (p = .012). Moreover, records with no abbreviations had fewer major errors (p = .021). In conclusion, not using abbreviations, ensuring more readable documentation, and paying more attention to available information increased coding accuracy and the quality of procedure databases. PMID:19471647

  15. Recognition of pornographic web pages by classifying texts and images.

    PubMed

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  16. Programmable molecular recognition based on the geometry of DNA nanostructures.

    PubMed

    Woo, Sungwook; Rothemund, Paul W K

    2011-07-10

    From ligand-receptor binding to DNA hybridization, molecular recognition plays a central role in biology. Over the past several decades, chemists have successfully reproduced the exquisite specificity of biomolecular interactions. However, engineering multiple specific interactions in synthetic systems remains difficult. DNA retains its position as the best medium with which to create orthogonal, isoenergetic interactions, based on the complementarity of Watson-Crick binding. Here we show that DNA can be used to create diverse bonds using an entirely different principle: the geometric arrangement of blunt-end stacking interactions. We show that both binary codes and shape complementarity can serve as a basis for such stacking bonds, and explore their specificity, thermodynamics and binding rules. Orthogonal stacking bonds were used to connect five distinct DNA origami. This work, which demonstrates how a single attractive interaction can be developed to create diverse bonds, may guide strategies for molecular recognition in systems beyond DNA nanostructures.

  17. Hypervariability of ribosomal DNA at multiple chromosomal sites in lake trout (Salvelinus namaycush).

    PubMed

    Zhuo, L; Reed, K M; Phillips, R B

    1995-06-01

    Variation in the intergenic spacer (IGS) of the ribosomal DNA (rDNA) of lake trout (Salvelinus namaycush) was examined. Digestion of genomic DNA with restriction enzymes showed that almost every individual had a unique combination of length variants with most of this variation occurring within rather than between populations. Sequence analysis of a 2.3 kilobase (kb) EcoRI-DraI fragment spanning the 3' end of the 28S coding region and approximately 1.8 kb of the IGS revealed two blocks of repetitive DNA. Putative transcriptional termination sites were found approximately 220 bases (b) downstream from the end of the 28S coding region. Comparison of the 2.3-kb fragments with two longer (3.1 kb) fragments showed that the major difference in length resulted from variation in the number of short (89 b) repeats located 3' to the putative terminator. Repeat units within a single nucleolus organizer region (NOR) appeared relatively homogeneous and genetic analysis found variants to be stably inherited. A comparison of the number of spacer-length variants with the number of NORs found that the number of length variants per individual was always less than the number of NORs. Examination of spacer variants in five populations showed that populations with more NORs had more spacer variants, indicating that variants are present at different rDNA sites on nonhomologous chromosomes.

  18. Using Neural Networks to Classify Digitized Images of Galaxies

    NASA Astrophysics Data System (ADS)

    Goderya, S. N.; McGuire, P. C.

    2000-12-01

    Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.

  19. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... classification. (b) Whenever classified material is removed from a storage facility, such material shall not be... classification of the information; and (2) The prospective recipient requires access to the information in order... documents that have been destroyed. (k) An inventory of all documents classified higher than confidential...

  20. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... classification. (b) Whenever classified material is removed from a storage facility, such material shall not be... classification of the information; and (2) The prospective recipient requires access to the information in order... documents that have been destroyed. (k) An inventory of all documents classified higher than confidential...

  1. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... classification. (b) Whenever classified material is removed from a storage facility, such material shall not be... classification of the information; and (2) The prospective recipient requires access to the information in order... documents that have been destroyed. (k) An inventory of all documents classified higher than confidential...

  2. Implementation of new physics models for low energy electrons in liquid water in Geant4-DNA.

    PubMed

    Bordage, M C; Bordes, J; Edel, S; Terrissol, M; Franceries, X; Bardiès, M; Lampe, N; Incerti, S

    2016-12-01

    A new alternative set of elastic and inelastic cross sections has been added to the very low energy extension of the Geant4 Monte Carlo simulation toolkit, Geant4-DNA, for the simulation of electron interactions in liquid water. These cross sections have been obtained from the CPA100 Monte Carlo track structure code, which has been a reference in the microdosimetry community for many years. They are compared to the default Geant4-DNA cross sections and show better agreement with published data. In order to verify the correct implementation of the CPA100 cross section models in Geant4-DNA, simulations of the number of interactions and ranges were performed using Geant4-DNA with this new set of models, and the results were compared with corresponding results from the original CPA100 code. Good agreement is observed between the implementations, with relative differences lower than 1% regardless of the incident electron energy. Useful quantities related to the deposited energy at the scale of the cell or the organ of interest for internal dosimetry, like dose point kernels, are also calculated using these new physics models. They are compared with results obtained using the well-known Penelope Monte Carlo code. Copyright © 2016 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  3. Classifier fusion for VoIP attacks classification

    NASA Astrophysics Data System (ADS)

    Safarik, Jakub; Rezac, Filip

    2017-05-01

    SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.

  4. A Consistent System for Coding Laboratory Samples

    NASA Astrophysics Data System (ADS)

    Sih, John C.

    1996-07-01

    A formal laboratory coding system is presented to keep track of laboratory samples. Preliminary useful information regarding the sample (origin and history) is gained without consulting a research notebook. Since this system uses and retains the same research notebook page number for each new experiment (reaction), finding and distinguishing products (samples) of the same or different reactions becomes an easy task. Using this system multiple products generated from a single reaction can be identified and classified in a uniform fashion. Samples can be stored and filed according to stage and degree of purification, e.g. crude reaction mixtures, recrystallized samples, chromatographed or distilled products.

  5. Decoding the non-coding genome: elucidating genetic risk outside the coding genome.

    PubMed

    Barr, C L; Misener, V L

    2016-01-01

    Current evidence emerging from genome-wide association studies indicates that the genetic underpinnings of complex traits are likely attributable to genetic variation that changes gene expression, rather than (or in combination with) variation that changes protein-coding sequences. This is particularly compelling with respect to psychiatric disorders, as genetic changes in regulatory regions may result in differential transcriptional responses to developmental cues and environmental/psychosocial stressors. Until recently, however, the link between transcriptional regulation and psychiatric genetic risk has been understudied. Multiple obstacles have contributed to the paucity of research in this area, including challenges in identifying the positions of remote (distal from the promoter) regulatory elements (e.g. enhancers) and their target genes and the underrepresentation of neural cell types and brain tissues in epigenome projects - the availability of high-quality brain tissues for epigenetic and transcriptome profiling, particularly for the adolescent and developing brain, has been limited. Further challenges have arisen in the prediction and testing of the functional impact of DNA variation with respect to multiple aspects of transcriptional control, including regulatory-element interaction (e.g. between enhancers and promoters), transcription factor binding and DNA methylation. Further, the brain has uncommon DNA-methylation marks with unique genomic distributions not found in other tissues - current evidence suggests the involvement of non-CG methylation and 5-hydroxymethylation in neurodevelopmental processes but much remains unknown. We review here knowledge gaps as well as both technological and resource obstacles that will need to be overcome in order to elucidate the involvement of brain-relevant gene-regulatory variants in genetic risk for psychiatric disorders. © 2015 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.

  6. Preparation of Proper Immunogen by Cloning and Stable Expression of cDNA coding for Human Hematopoietic Stem Cell Marker CD34 in NIH-3T3 Mouse Fibroblast Cell Line

    PubMed Central

    Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Majidi, Jafar; Movassaghpour, Ali Akbar; Shanehbandi, Dariush; Kazemi, Tohid

    2015-01-01

    Purpose: Transmembrane CD34 glycoprotein is the most important marker for identification, isolation and enumeration of hematopoietic stem cells (HSCs). We aimed in this study to clone the cDNA coding for human CD34 from KG1a cell line and stably express in mouse fibroblast cell line NIH-3T3. Such artificial cell line could be useful as proper immunogen for production of mouse monoclonal antibodies. Methods: CD34 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy TA-cloning vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. Results: 1158 bp specific band was aligned completely to reference sequence in NCBI database corresponding to long isoform of human CD34. Transient and stable expression of human CD34 on transfected NIH-3T3 mouse fibroblast cells was achieved (25% and 95%, respectively) as shown by flow cytometry. Conclusion: Cloning and stable expression of human CD34 cDNA was successfully performed and validated by standard flow cytometric analysis. Due to murine origin of NIH-3T3 cell line, CD34-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD34. This approach could bypass the need for purification of recombinant proteins produced in eukaryotic expression systems. PMID:25789221

  7. Palmprint authentication using multiple classifiers

    NASA Astrophysics Data System (ADS)

    Kumar, Ajay; Zhang, David

    2004-08-01

    This paper investigates the performance improvement for palmprint authentication using multiple classifiers. The proposed methods on personal authentication using palmprints can be divided into three categories; appearance- , line -, and texture-based. A combination of these approaches can be used to achieve higher performance. We propose to simultaneously extract palmprint features from PCA, Line detectors and Gabor-filters and combine their corresponding matching scores. This paper also investigates the comparative performance of simple combination rules and the hybrid fusion strategy to achieve performance improvement. Our experimental results on the database of 100 users demonstrate the usefulness of such approach over those based on individual classifiers.

  8. DNA unzipping phase diagram calculated via replica theory.

    PubMed

    Roland, C Brian; Hatch, Kristi Adamson; Prentiss, Mara; Shakhnovich, Eugene I

    2009-05-01

    We show how single-molecule unzipping experiments can provide strong evidence that the zero-force melting transition of long molecules of natural dsDNA should be classified as a phase transition of the higher-order type (continuous). Toward this end, we study a statistical-mechanics model for the fluctuating structure of a long molecule of dsDNA, and compute the equilibrium phase diagram for the experiment in which the molecule is unzipped under applied force. We consider a perfect-matching dsDNA model, in which the loops are volume-excluding chains with arbitrary loop exponent c . We include stacking interactions, hydrogen bonds, and main-chain entropy. We include sequence heterogeneity at the level of random sequences; in particular, there is no correlation in the base-pairing (bp) energy from one sequence position to the next. We present heuristic arguments to demonstrate that the low-temperature macrostate does not exhibit degenerate ergodicity breaking. We use this claim to understand the results of our replica-theoretic calculation of the equilibrium properties of the system. As a function of temperature, we obtain the minimal force at which the molecule separates completely. This critical-force curve is a line in the temperature-force phase diagram that marks the regions where the molecule exists primarily as a double helix versus the region where the molecule exists as two separate strands. We compare our random-sequence model to magnetic tweezer experiments performed on the 48 502 bp genome of bacteriophage lambda . We find good agreement with the experimental data, which is restricted to temperatures between 24 and 50 degrees C . At higher temperatures, the critical-force curve of our random-sequence model is very different for that of the homogeneous-sequence version of our model. For both sequence models, the critical force falls to zero at the melting temperature T_{c} like |T-T_{c}|;{alpha} . For the homogeneous-sequence model, alpha=1/2 almost

  9. Twisting Right to Left: A…A Mismatch in a CAG Trinucleotide Repeat Overexpansion Provokes Left-Handed Z-DNA Conformation

    PubMed Central

    2015-01-01

    Conformational polymorphism of DNA is a major causative factor behind several incurable trinucleotide repeat expansion disorders that arise from overexpansion of trinucleotide repeats located in coding/non-coding regions of specific genes. Hairpin DNA structures that are formed due to overexpansion of CAG repeat lead to Huntington’s disorder and spinocerebellar ataxias. Nonetheless, DNA hairpin stem structure that generally embraces B-form with canonical base pairs is poorly understood in the context of periodic noncanonical A…A mismatch as found in CAG repeat overexpansion. Molecular dynamics simulations on DNA hairpin stems containing A…A mismatches in a CAG repeat overexpansion show that A…A dictates local Z-form irrespective of starting glycosyl conformation, in sharp contrast to canonical DNA duplex. Transition from B-to-Z is due to the mechanistic effect that originates from its pronounced nonisostericity with flanking canonical base pairs facilitated by base extrusion, backbone and/or base flipping. Based on these structural insights we envisage that such an unusual DNA structure of the CAG hairpin stem may have a role in disease pathogenesis. As this is the first study that delineates the influence of a single A…A mismatch in reversing DNA helicity, it would further have an impact on understanding DNA mismatch repair. PMID:25876062

  10. N6-methyladenine: a conserved and dynamic DNA mark

    PubMed Central

    O’Brown, Zach Klapholz; Greer, Eric Lieberman

    2017-01-01

    Chromatin, consisting of deoxyribonucleic acid (DNA) wrapped around histone proteins, facilitates DNA compaction and allows identical DNA code to confer many different cellular phenotypes. This biological versatility is accomplished in large part by post-translational modifications to histones and chemical modifications to DNA. These modifications direct the cellular machinery to expand or compact specific chromatin regions, and mark regions of the DNA as important for cellular functions. While each of the four bases that make up DNA can be modified (Iyer et al. 2011), this chapter will focus on methylation of the 6th position on adenines (6mA), as this modification has been poorly characterized in recently evolved eukaryotes but shows promise as a new conserved layer of epigenetic regulation. 6mA was previously thought to be restricted to unicellular organisms, but recent work has revealed its presence in more recently evolved metazoa. Here, we will briefly describe the history of 6mA, examine its evolutionary conservation, and evaluate the current methods for detecting 6mA. We will discuss the enzymes that bind and regulate this mark and finally examine known and potential functions of 6mA in eukaryotes. PMID:27826841

  11. A Pol V–Mediated Silencing, Independent of RNA–Directed DNA Methylation, Applies to 5S rDNA

    PubMed Central

    Douet, Julien; Tutois, Sylvie; Tourmente, Sylvette

    2009-01-01

    The plant-specific RNA polymerases Pol IV and Pol V are essential to RNA–directed DNA methylation (RdDM), which also requires activities from RDR2 (RNA–Dependent RNA Polymerase 2), DCL3 (Dicer-Like 3), AGO4 (Argonaute), and DRM2 (Domains Rearranged Methyltransferase 2). RdDM is dedicated to the methylation of target sequences which include transposable elements, regulatory regions of several protein-coding genes, and 5S rRNA–encoding DNA (rDNA) arrays. In this paper, we have studied the expression of the 5S-210 transcript, a marker of silencing release at 5S RNA genes, to show a differential impact of RNA polymerases IV and V on 5S rDNA arrays during early development of the plant. Using a combination of molecular and cytological assays, we show that Pol IV, RDR2, DRM2, and Pol V, actors of the RdDM, are required to maintain a transcriptional silencing of 5S RNA genes at chromosomes 4 and 5. Moreover, we have shown a derepression associated to chromatin decondensation specific to the 5S array from chromosome 4 and restricted to the Pol V–loss of function. In conclusion, our results highlight a new role for Pol V on 5S rDNA, which is RdDM–independent and comes specifically at chromosome 4, in addition to the RdDM pathway. PMID:19834541

  12. Non-coding stem-bulge RNAs are required for cell proliferation and embryonic development in C. elegans

    PubMed Central

    Kowalski, Madzia P.; Baylis, Howard A.; Krude, Torsten

    2015-01-01

    ABSTRACT Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication proteins in vitro, and that conserved nucleotide sequence motifs are essential for this function. By functionally inactivating sbRNAs with antisense morpholino oligonucleotides, we show that sbRNAs are required for S phase progression, early embryonic development and the viability of C. elegans in vivo. Thus, we demonstrate a new and essential role for sbRNAs during the early development of C. elegans. sbRNAs show limited nucleotide sequence similarity to vertebrate Y RNAs, which are also essential for the initiation of DNA replication. Our results therefore establish that the essential function of small non-coding stem-loop RNAs during DNA replication extends beyond vertebrates. PMID:25908866

  13. Computational DNA hole spectroscopy: A new tool to predict mutation hotspots, critical base pairs, and disease 'driver' mutations.

    PubMed

    Villagrán, Martha Y Suárez; Miller, John H

    2015-08-27

    We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease 'driver' mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.

  14. 6 CFR 7.12 - Violations of classified information requirements.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Violations of classified information requirements. 7.12 Section 7.12 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY CLASSIFIED NATIONAL SECURITY INFORMATION Administration § 7.12 Violations of classified information...

  15. A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.

    PubMed

    Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan

    2014-07-18

    Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.

  16. Biobar-coded gold nanoparticles and DNAzyme-based dual signal amplification strategy for ultrasensitive detection of protein by electrochemiluminescence.

    PubMed

    Xia, Hui; Li, Lingling; Yin, Zhouyang; Hou, Xiandeng; Zhu, Jun-Jie

    2015-01-14

    A dual signal amplification strategy for electrochemiluminescence (ECL) aptasensor was designed based on biobar-coded gold nanoparticles (Au NPs) and DNAzyme. CdSeTe@ZnS quantum dots (QDs) were chosen as the ECL signal probes. To verify the proposed ultrasensitive ECL aptasensor for biomolecules, we detected thrombin (Tb) as a proof-of-principle analyte. The hairpin DNA designed for the recognition of protein consists of two parts: the sequences of catalytical 8-17 DNAzyme and thrombin aptamer. Only in the presence of thrombin could the hairpin DNA be opened, followed by a recycling cleavage of excess substrates by catalytic core of the DNAzyme to induce the first-step amplification. One part of the fragments was captured to open the capture DNA modified on the Au electrode, which further connected with the prepared biobar-coded Au NPs-CdSeTe@ZnS QDs to get the final dual-amplified ECL signal. The limit of detection for Tb was 0.28 fM with excellent selectivity, and this proposed method possessed good performance in real sample analysis. This design introduces the new concept of dual-signal amplification by a biobar-coded system and DNAzyme recycling into ECL determination, and it is promising to be extended to provide a highly sensitive platform for various target biomolecules.

  17. New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

    PubMed Central

    Černý, Viktor; Carracedo, Ángel

    2011-01-01

    Background Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. Methodology/Principal Findings Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. Conclusions/Significance Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mt

  18. Investigation of the mechanism of meiotic DNA cleavage by VMA1-derived endonuclease uncovers a meiotic alteration in chromatin structure around the target site.

    PubMed

    Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu

    2006-06-01

    VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation.

  19. Investigation of the Mechanism of Meiotic DNA Cleavage by VMA1-Derived Endonuclease Uncovers a Meiotic Alteration in Chromatin Structure around the Target Site

    PubMed Central

    Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu

    2006-01-01

    VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation. PMID:16757746

  20. Simulation of the Formation of DNA Double Strand Breaks and Chromosome Aberrations in Irradiated Cells

    NASA Technical Reports Server (NTRS)

    Plante, Ianik; Ponomarev, Artem L.; Wu, Honglu; Blattnig, Steve; George, Kerry

    2014-01-01

    The formation of DNA double-strand breaks (DSBs) and chromosome aberrations is an important consequence of ionizing radiation. To simulate DNA double-strand breaks and the formation of chromosome aberrations, we have recently merged the codes RITRACKS (Relativistic Ion Tracks) and NASARTI (NASA Radiation Track Image). The program RITRACKS is a stochastic code developed to simulate detailed event-by-event radiation track structure: [1] This code is used to calculate the dose in voxels of 20 nm, in a volume containing simulated chromosomes, [2] The number of tracks in the volume is calculated for each simulation by sampling a Poisson distribution, with the distribution parameter obtained from the irradiation dose, ion type and energy. The program NASARTI generates the chromosomes present in a cell nucleus by random walks of 20 nm, corresponding to the size of the dose voxels, [3] The generated chromosomes are located within domains which may intertwine, and [4] Each segment of the random walks corresponds to approx. 2,000 DNA base pairs. NASARTI uses pre-calculated dose at each voxel to calculate the probability of DNA damage at each random walk segment. Using the location of double-strand breaks, possible rejoining between damaged segments is evaluated. This yields various types of chromosomes aberrations, including deletions, inversions, exchanges, etc. By performing the calculations using various types of radiations, it will be possible to obtain relative biological effectiveness (RBE) values for several types of chromosome aberrations.