Science.gov

Sample records for classifying coding dna

  1. DNA codes

    SciTech Connect

    Torney, D. C.

    2001-01-01

    We have begun to characterize a variety of codes, motivated by potential implementation as (quaternary) DNA n-sequences, with letters denoted A, C The first codes we studied are the most reminiscent of conventional group codes. For these codes, Hamming similarity was generalized so that the score for matched letters takes more than one value, depending upon which letters are matched [2]. These codes consist of n-sequences satisfying an upper bound on the similarities, summed over the letter positions, of distinct codewords. We chose similarity 2 for matches of letters A and T and 3 for matches of the letters C and G, providing a rough approximation to double-strand bond energies in DNA. An inherent novelty of DNA codes is 'reverse complementation'. The latter may be defined, as follows, not only for alphabets of size four, but, more generally, for any even-size alphabet. All that is required is a matching of the letters of the alphabet: a partition into pairs. Then, the reverse complement of a codeword is obtained by reversing the order of its letters and replacing each letter by its match. For DNA, the matching is AT/CG because these are the Watson-Crick bonding pairs. Reversal arises because two DNA sequences form a double strand with opposite relative orientations. Thus, as will be described in detail, because in vitro decoding involves the formation of double-stranded DNA from two codewords, it is reasonable to assume - for universal applicability - that the reverse complement of any codeword is also a codeword. In particular, self-reverse complementary codewords are expressly forbidden in reverse-complement codes. Thus, an appropriate distance between all pairs of codewords must, when large, effectively prohibit binding between the respective codewords: to form a double strand. Only reverse-complement pairs of codewords should be able to bind. For most applications, a DNA code is to be bi-partitioned, such that the reverse-complementary pairs are separated

  2. DNA: Polymer and molecular code

    NASA Astrophysics Data System (ADS)

    Shivashankar, G. V.

    1999-10-01

    The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes

  3. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  4. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  5. Superimposed Code Theorectic Analysis of DNA Codes and DNA Computing

    DTIC Science & Technology

    2010-03-01

    that the hybridization that occurs between a DNA strand and its Watson - Crick complement can be used to perform mathematical computation. This research... Watson - Crick (WC) duplex, e.g., TCGCA TCGCA . Note that non-WC duplexes can form and such a formation is called a cross-hybridization. Cross...5’GAAAGTCGCGTA3’ Watson Crick (WC) Duplexes TACGCGACTTTC Cross Hybridized (CH) Duplexes ATTTTTGCGTTA GAAAAAGAAGAA Coding Strands for Ligation

  6. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    DTIC Science & Technology

    2008-07-01

    COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES ... sequences which are generalizations of the Fibonacci sequences . 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16...coding bound on the rate of DNA codes is proved. To obtain the bound, we use some ensembles of DNA sequences which are generalizations of the Fibonacci

  7. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction.

    PubMed

    Pokkuluri, Kiran Sree; Inampudi, Ramesh Babu; Nedunuri, S S S N Usha Devi

    2014-01-01

    Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.

  8. Classified JPEG coding of mixed document images for printing.

    PubMed

    Ramos, M G; de Queiroz, R L

    2000-01-01

    This paper presents a modified JPEG coder that is applied to the compression of mixed documents (containing text, natural images, and graphics) for printing purposes. The modified JPEG coder proposed in this paper takes advantage of the distinct perceptually significant regions in these documents to achieve higher perceptual quality than the standard JPEG coder. The region-adaptivity is performed via classified thresholding being totally compliant with the baseline standard. A computationally efficient classification algorithm is presented, and the improved performance of the classified JPEG coder is verified.

  9. Superimposed Code Theoretic Analysis of DNA Codes and DNA Computing

    DTIC Science & Technology

    2008-01-01

    complements of one another and the DNA duplex formed is a Watson - Crick (WC) duplex. However, there are many instances when the formation of non-WC...that the user’s requirements for probe selection are met based on the Watson - Crick probe locality within a target. The second type, called

  10. Incorporating DNA Methylation Dynamics Into Epigenetic Codes

    PubMed Central

    Szulwach, Keith E.; Jin, Peng

    2014-01-01

    Summary Genomic function is dictated by a combination of DNA sequence and the molecular mechanisms controlling access to genetic information. Access to DNA can be determined by the interpretation of covalent modifications that influence the packaging of DNA into chromatin, including DNA methylation and histone modifications. These modifications are believed to be forms of “epigenetic codes” that exist in discernable combinations that reflect cellular phenotype. Although DNA methylation is known to play important roles in gene regulation and genomic function, its contribution to the encoding of epigenetic information is just beginning to emerge. Here we discuss paradigms associated with the various components of DNA methylation/demethylation and recent advances in the understanding of its dynamic regulation in the genome, integrating these mechanisms into a framework to explain how DNA methylation could contribute to epigenetic codes. PMID:24242211

  11. Lung Cancer Classification Employing Proposed Real Coded Genetic Algorithm Based Radial Basis Function Neural Network Classifier.

    PubMed

    Selvakumari Jeya, I Jasmine; Deepa, S N

    2016-01-01

    A proposed real coded genetic algorithm based radial basis function neural network classifier is employed to perform effective classification of healthy and cancer affected lung images. Real Coded Genetic Algorithm (RCGA) is proposed to overcome the Hamming Cliff problem encountered with the Binary Coded Genetic Algorithm (BCGA). Radial Basis Function Neural Network (RBFNN) classifier is chosen as a classifier model because of its Gaussian Kernel function and its effective learning process to avoid local and global minima problem and enable faster convergence. This paper specifically focused on tuning the weights and bias of RBFNN classifier employing the proposed RCGA. The operators used in RCGA enable the algorithm flow to compute weights and bias value so that minimum Mean Square Error (MSE) is obtained. With both the lung healthy and cancer images from Lung Image Database Consortium (LIDC) database and Real time database, it is noted that the proposed RCGA based RBFNN classifier has performed effective classification of the healthy lung tissues and that of the cancer affected lung nodules. The classification accuracy computed using the proposed approach is noted to be higher in comparison with that of the classifiers proposed earlier in the literatures.

  12. Lung Cancer Classification Employing Proposed Real Coded Genetic Algorithm Based Radial Basis Function Neural Network Classifier

    PubMed Central

    Deepa, S. N.

    2016-01-01

    A proposed real coded genetic algorithm based radial basis function neural network classifier is employed to perform effective classification of healthy and cancer affected lung images. Real Coded Genetic Algorithm (RCGA) is proposed to overcome the Hamming Cliff problem encountered with the Binary Coded Genetic Algorithm (BCGA). Radial Basis Function Neural Network (RBFNN) classifier is chosen as a classifier model because of its Gaussian Kernel function and its effective learning process to avoid local and global minima problem and enable faster convergence. This paper specifically focused on tuning the weights and bias of RBFNN classifier employing the proposed RCGA. The operators used in RCGA enable the algorithm flow to compute weights and bias value so that minimum Mean Square Error (MSE) is obtained. With both the lung healthy and cancer images from Lung Image Database Consortium (LIDC) database and Real time database, it is noted that the proposed RCGA based RBFNN classifier has performed effective classification of the healthy lung tissues and that of the cancer affected lung nodules. The classification accuracy computed using the proposed approach is noted to be higher in comparison with that of the classifiers proposed earlier in the literatures. PMID:28050198

  13. Security authentication with a three-dimensional optical phase code using random forest classifier: an overview

    NASA Astrophysics Data System (ADS)

    Markman, Adam; Carnicer, Artur; Javidi, Bahram

    2017-05-01

    We overview our recent work [1] on utilizing three-dimensional (3D) optical phase codes for object authentication using the random forest classifier. A simple 3D optical phase code (OPC) is generated by combining multiple diffusers and glass slides. This tag is then placed on a quick-response (QR) code, which is a barcode capable of storing information and can be scanned under non-uniform illumination conditions, rotation, and slight degradation. A coherent light source illuminates the OPC and the transmitted light is captured by a CCD to record the unique signature. Feature extraction on the signature is performed and inputted into a pre-trained random-forest classifier for authentication.

  14. Telomeres, histone code, and DNA damage response.

    PubMed

    Misri, S; Pandita, S; Kumar, R; Pandita, T K

    2008-01-01

    Genomic stability is maintained by telomeres, the end terminal structures that protect chromosomes from fusion or degradation. Shortening or loss of telomeric repeats or altered telomere chromatin structure is correlated with telomere dysfunction such as chromosome end-to-end associations that could lead to genomic instability and gene amplification. The structure at the end of telomeres is such that its DNA differs from DNA double strand breaks (DSBs) to avoid nonhomologous end-joining (NHEJ), which is accomplished by forming a unique higher order nucleoprotein structure. Telomeres are attached to the nuclear matrix and have a unique chromatin structure. Whether this special structure is maintained by specific chromatin changes is yet to be thoroughly investigated. Chromatin modifications implicated in transcriptional regulation are thought to be the result of a code on the histone proteins (histone code). This code, involving phosphorylation, acetylation, methylation, ubiquitylation, and sumoylation of histones, is believed to regulate chromatin accessibility either by disrupting chromatin contacts or by recruiting non-histone proteins to chromatin. The histone code in which distinct histone tail-protein interactions promote engagement may be the deciding factor for choosing specific DSB repair pathways. Recent evidence suggests that such mechanisms are involved in DNA damage detection and repair. Altered telomere chromatin structure has been linked to defective DNA damage response (DDR), and eukaryotic cells have evolved DDR mechanisms utilizing proficient DNA repair and cell cycle checkpoints in order to maintain genomic stability. Recent studies suggest that chromatin modifying factors play a critical role in the maintenance of genomic stability. This review will summarize the role of DNA damage repair proteins specifically ataxia-telangiectasia mutated (ATM) and its effectors and the telomere complex in maintaining genome stability.

  15. Telomeres, histone code, and DNA damage response

    PubMed Central

    Misri, S.; Pandita, S.; Kumar, R.; Pandita, T.K.

    2009-01-01

    Genomic stability is maintained by telomeres, the end terminal structures that protect chromosomes from fusion or degradation. Shortening or loss of telomeric repeats or altered telomere chromatin structure is correlated with telomere dysfunction such as chromosome end-to-end associations that could lead to genomic instability and gene amplification. The structure at the end of telomeres is such that its DNA differs from DNA double strand breaks (DSBs) to avoid nonhomologous end-joining (NHEJ), which is accomplished by forming a unique higher order nucleoprotein structure. Telomeres are attached to the nuclear matrix and have a unique chromatin structure. Whether this special structure is maintained by specific chromatin changes is yet to be thoroughly investigated. Chromatin modifications implicated in transcriptional regulation are thought to be the result of a code on the histone proteins (histone code). This code, involving phosphorylation, acetylation, methylation, ubiquitylation, and sumoylation of histones, is believed to regulate chromatin accessibility either by disrupting chromatin contacts or by recruiting non-histone proteins to chromatin. The histone code in which distinct histone tail-protein interactions promote engagement may be the deciding factor for choosing specific DSB repair pathways. Recent evidence suggests that such mechanisms are involved in DNA damage detection and repair. Altered telomere chromatin structure has been linked to defective DNA damage response (DDR), and eukaryotic cells have evolved DDR mechanisms utilizing proficient DNA repair and cell cycle checkpoints in order to maintain genomic stability. Recent studies suggest that chromatin modifying factors play a critical role in the maintenance of genomic stability. This review will summarize the role of DNA damage repair proteins specifically ataxia-telangiectasia mutated (ATM) and its effectors and the telomere complex in maintaining genome stability. PMID:19188699

  16. Indications for spine surgery: validation of an administrative coding algorithm to classify degenerative diagnoses

    PubMed Central

    Lurie, Jon D.; Tosteson, Anna N.A.; Deyo, Richard A.; Tosteson, Tor; Weinstein, James; Mirza, Sohail K.

    2014-01-01

    Study Design Retrospective analysis of Medicare claims linked to a multi-center clinical trial. Objective The Spine Patient Outcomes Research Trial (SPORT) provided a unique opportunity to examine the validity of a claims-based algorithm for grouping patients by surgical indication. SPORT enrolled patients for lumbar disc herniation, spinal stenosis, and degenerative spondylolisthesis. We compared the surgical indication derived from Medicare claims to that provided by SPORT surgeons, the “gold standard”. Summary of Background Data Administrative data are frequently used to report procedure rates, surgical safety outcomes, and costs in the management of spinal surgery. However, the accuracy of using diagnosis codes to classify patients by surgical indication has not been examined. Methods Medicare claims were link to beneficiaries enrolled in SPORT. The sensitivity and specificity of three claims-based approaches to group patients based on surgical indications were examined: 1) using the first listed diagnosis; 2) using all diagnoses independently; and 3) using a diagnosis hierarchy based on the support for fusion surgery. Results Medicare claims were obtained from 376 SPORT participants, including 21 with disc herniation, 183 with spinal stenosis, and 172 with degenerative spondylolisthesis. The hierarchical coding algorithm was the most accurate approach for classifying patients by surgical indication, with sensitivities of 76.2%, 88.1%, and 84.3% for disc herniation, spinal stenosis, and degenerative spondylolisthesis cohorts, respectively. The specificity was 98.3% for disc herniation, 83.2% for spinal stenosis, and 90.7% for degenerative spondylolisthesis. Misclassifications were primarily due to codes attributing more complex pathology to the case. Conclusion Standardized approaches for using claims data to accurately group patients by surgical indications has widespread interest. We found that a hierarchical coding approach correctly classified over 90

  17. nRC: non-coding RNA Classifier based on structural features.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso

    2017-01-01

    Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

  18. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.

  19. DNA methylation profiling can classify HIV-associated lymphomas.

    PubMed

    Matsunaga, Akihiro; Hishima, Tsunekazu; Tanaka, Noriko; Yamasaki, Maria; Yoshida, Lui; Mochizuki, Makoto; Tanuma, Junko; Oka, Shinichi; Ishizaka, Yukihito; Shimura, Mari; Hagiwara, Shotaro

    2014-02-20

    HIV-positive patients have a 60-fold to 200-fold increased incidence of non-Hodgkin lymphomas, including Burkitt lymphoma, diffuse large B-cell lymphoma, and primary central nervous system lymphoma. HIV-associated lymphomas frequently have features such as extranodal involvement, decreased responses to standard chemotherapy, and high relapse rates, which indicate a poor prognosis. General pathological features do not clearly differentiate HIV-associated lymphomas from non-HIV lymphomas. To investigate the features of HIV-associated lymphomas, we performed genome-wide DNA methylation profiling of HIV and non-HIV lymphomas using Illumina GoldenGate Methylation Cancer Panel I and Illumina Infinium HumanMethylation450 BeadChip microarrays. DNA methylation profiles in HIV-associated and non-HIV lymphomas were characterized using unsupervised hierarchical clustering analyses. The analyses of promoter regions revealed unique DNA methylation profiles in HIV-associated lymphomas, suggesting profile differences compared with non-HIV lymphomas, which implies specific gene regulation in HIV-associated lymphoma involving DNA methylation. Based on HumanMethylation450 BeadChip data, 2541 target sites were selected as differing significantly in comparisons between HIV-associated and non-HIV-associated lymphomas using Wilcoxon's rank-sum test (P <0.05) and Δβ values more than 0.30. Recurrent cases of HIV-associated lymphoma had different profiles compared with nonrecurrent HIV lymphomas. DNA methylation profiling indicated that 2541 target sites differed significantly in HIV-associated lymphoma, which may partly explain the poor prognosis. Our data indicate that the methylation profiles of target genes have potential in elucidating HIV-associated lymphomagenesis and can serve as new prognostic markers.

  20. Cancer diagnostic classifiers based on quantitative DNA methylation

    PubMed Central

    2014-01-01

    Epigenetic change is part of the carcinogenic process and a deep reservoir for biomarker discovery. Reversible methylation of cytosines is noteworthy because it can be measured accurately and easily by various molecular methods and DNA methylation patterns are linked to important tumourigenic pathways. Clinically relevant methylation changes are known in common human cancers such as cervix, prostate, breast, colon, bladder, stomach and lung. Differential methylation may have a central role in the development and outcome of most if not all human malignancies. The advent of deep sequencing holds great promise for epigenomics, with bioinformatics tools ready to reveal large numbers of new targets for prognosis and therapeutic intervention. This review focuses on two selected cancers, namely cervix and prostate, which illustrate the more general themes of epigenetic diagnostics in cancer. Also discussed is differential methylation of specific human and viral DNA targets and laboratory methods for measuring methylation biomarkers. PMID:24649818

  1. Classified-edge guided depth resampling for multi-view coding

    NASA Astrophysics Data System (ADS)

    Lu, Yu; Zhou, Yang; Chen, Hua-hua

    2016-01-01

    A new depth resampling for multi-view coding is proposed in this paper. At first, the depth video is downsampled by median filtering before encoding. After decoding, the classified edges, including credible edge and probable edge from the aligned texture image and the depth image, are interpolated by the selected diagonal pair, whose intensity difference is the minimum among four diagonal pairs around edge pixel. According to different category of edge, the intensity difference is measured by either real depth or percentage depth without any parameter setting. Finally, the resampled depth video and the decoded full-resolution texture video are synthesized into virtual views for the performance evaluation. Experiments on the platform of multi-view high efficiency video coding (HEVC) demonstrate that the proposed method is superior to the contrastive methods in terms of visual quality and rate distortion (RD) performance.

  2. DNA Code Validation Using Experimental Fluorescence Measurements and Thermodynamic Calculations

    DTIC Science & Technology

    2004-03-01

    1 SUMMARY A DNA code is a collection of single-stranded DNA molecules. In DNA hybridization assays, the formation of any Watson - Crick ...combinations represent the canonical Watson - Crick pairings. To obtain the reverse complement of a strand of DNA , one must first reverse the order of the... DNA codes. Using software designed by A.Macula and V. Rykov, (Macula, 2003), a set of 13 pairs, (X, WC(X)), of Watson - Crick reverse complementary

  3. Validation of an administrative coding algorithm for classifying surgical indication and operative features of spine surgery.

    PubMed

    Kazberouk, Alexander; Martin, Brook I; Stevens, Jennifer P; McGuire, Kevin J

    2015-01-15

    Retrospective review of medical records and administrative data. Validate a claims-based algorithm for classifying surgical indication and operative features in lumbar surgery. Administrative data are valuable to study rates, safety, outcomes, and costs in spine surgery. Previous research evaluates outcomes by procedure, not indications and operative features. One previous study validated a coding algorithm for classifying surgical indication. Few studies examined claims data for classifying patients by operative features. Patients undergoing lumbar decompression or fusion at a single institution in 2009 for back pain, herniated disc, stenosis, spondylolisthesis, or scoliosis were included. Sensitivity and specificity of a claims-based algorithm for indication and operative features were examined versus medical record abstraction. A total of 477 patients, including 246 (52%) undergoing fusion and 231 (48%) undergoing decompression were included in this study. Sensitivity of the claims-based coding algorithm for classifying the indication for the procedure was 71.9% for degenerative disc disease, 81.9% for disc herniation, 32.7% for spinal stenosis, 90.4% for degenerative spondylolisthesis, and 93.8% for scoliosis. Specificity was 87.9% for degenerative disc, 85.6% for disc herniation, 90.7% for spinal stenosis, 95.0% for degenerative spondylolisthesis, and 97.3% for scoliosis. Sensitivity and specificity of claims data for identifying the type of procedure for fusion cases was 97.6% and 99.1%, respectively. Sensitivity of claims data for characterizing key operative features was 81.7%, 96.4%, and 53.0% for use of instrumentation, combined (anterior and posterior) surgical approach, and 3 or more disc levels fused, respectively. Specificity was 57.1% for instrumentation, 94.5% for combined approaches, and 71.9% for 3 or more disc levels fused. Claims data accurately reflected certain diagnoses and type of procedures, but were less accurate at characterizing

  4. V(D)J recombination coding junction formation without DNA homology: processing of coding termini.

    PubMed Central

    Boubnov, N V; Wills, Z P; Weaver, D T

    1993-01-01

    Coding junction formation in V(D)J recombination generates diversity in the antigen recognition structures of immunoglobulin and T-cell receptor molecules by combining processes of deletion of terminal coding sequences and addition of nucleotides prior to joining. We have examined the role of coding end DNA composition in junction formation with plasmid substrates containing defined homopolymers flanking the recombination signal sequence elements. We found that coding junctions formed efficiently with or without terminal DNA homology. The extent of junctional deletion was conserved independent of coding ends with increased, partial, or no DNA homology. Interestingly, G/C homopolymer coding ends showed reduced deletion regardless of DNA homology. Therefore, DNA homology cannot be the primary determinant that stabilizes coding end structures for processing and joining. PMID:8413286

  5. Crossing Disciplinary Lines--Bar Codes and DNA Codes.

    ERIC Educational Resources Information Center

    Liao, Thomas T.

    1997-01-01

    Discusses strategies that enable students to learn ideas and concepts in the context of how modern communication technology is designed and operates. Describes a course that integrates the study of math, science, and technology into topics that are engaging to students. Presents an activity that introduces students to digital coding and compares…

  6. DNA codes and information: formal structures and relational causes.

    PubMed

    Sternberg, Richard V

    2008-09-01

    Recently the terms "codes" and "information" as used in the context of molecular biology have been the subject of much discussion. Here I propose that a variety of structural realism can assist us in rethinking the concepts of DNA codes and information apart from semantic criteria. Using the genetic code as a theoretical backdrop, a necessary distinction is made between codes qua symbolic representations and information qua structure that accords with data. Structural attractors are also shown to be entailed by the mapping relation that any DNA code is a part of (as the domain). In this framework, these attractors are higher-order informational structures that obviate any "DNA-centric" reductionism. In addition to the implications that are discussed, this approach validates the array of coding systems now recognized in molecular biology.

  7. Classifier assessment and feature selection for recognizing short coding sequences of human genes.

    PubMed

    Song, Kai; Zhang, Ze; Tong, Tuo-Peng; Wu, Fang

    2012-03-01

    With the ever-increasing pace of genome sequencing, there is a great need for fast and accurate computational tools to automatically identify genes in these genomes. Although great progress has been made in the development of gene-finding algorithms during the past decades, there is still room for further improvement. In particular, the issue of recognizing short exons in eukaryotes is still not solved satisfactorily. This article is devoted to assessing various linear and kernel-based classification algorithms and selecting the best combination of Z-curve features for further improvement of the issue. Eight state-of-the-art linear and kernel-based supervised pattern recognition techniques were used to identify the short (21-192 bp) coding sequences of human genes. By measuring the prediction accuracy, the tradeoff between sensitivity and specificity and the time consumption, partial least squares (PLS) and kernel partial least squares (KPLS) algorithms were verified to be the most optimal linear and kernel-based classifiers, respectively. A surprising result was that, by making good use of the interpretability of the PLS and the Z-curve methods, 93 Z-curve features were proved to be the best selective combination. Using them, the average recognition accuracy was improved as high as 7.7% by means of KPLS when compared with what was obtained by the Fisher discriminant analysis using 189 Z-curve variables (Gao and Zhang, 2004 ). The used codes are freely available from the following approaches (implemented in MATLAB and supported on Linux and MS Windows): (1) SVM: http://www.support-vector-machines.org/SVM_soft.html. (2) GP: http://www.gaussianprocess.org. (3) KPLS and KFDA: Taylor, J.S., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK. (4) PLS: Wise, B.M., and Gallagher, N.B. 2011. PLS-Toolbox for use with MATLAB: ver 1.5.2. Eigenvector Technologies, Manson, WA. Supplementary Material for this article is

  8. BioCode: Two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA

    PubMed Central

    2013-01-01

    Background In recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA). Results This paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA. Conclusion The algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated

  9. BioCode: two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA.

    PubMed

    Haughton, David; Balado, Félix

    2013-04-09

    In recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA). This paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA. The algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated effects. Furthermore, the

  10. Chloroplast DNA codes for transfer RNA.

    PubMed Central

    McCrea, J M; Hershberger, C L

    1976-01-01

    Transfer RNA's were isolated from Euglena gracilis. Chloroplast cistrons for tRNA were quantitated by hybridizing tRNA to ct DNA. Species of tRNA hybridizing to ct DNA were partially purified by hybridization-chromatography. The tRNA's hybridizing to ct DNA and nuclear DNA appear to be different. Total cellular tRNA was hybridized to ct DNA to an equivalent of approximately 25 cistrons. The total cellular tRNA was also separated into 2 fractions by chromatography on dihydroxyboryl substituted amino ethyl cellulose. Fraction I hybridized to both nuclear and ct DNA. Hybridizations to ct DNA indicated approximately 18 cistrons. Fraction II-tRNA hybridized only to ct DNA, saturating at a level of approximately 7 cistrons. The tRNA from isolated chloroplasts hybridized to both chloroplast and nuclear DNA. The level of hybridization to ct DNA indicated approximately 18 cistrons. Fraction II-type tRNA could not be detected in the isolated chloroplasts. PMID:823529

  11. DNA Barcoding through Quaternary LDPC Codes

    PubMed Central

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348

  12. DNA Barcoding through Quaternary LDPC Codes.

    PubMed

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).

  13. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  14. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  15. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  16. Protection of the genome and central protein-coding sequences by non-coding DNA against DNA damage from radiation.

    PubMed

    Qiu, Guo-Hua

    2015-01-01

    Non-coding DNA comprises a very large proportion of the total genomic content in higher organisms, but its function remains largely unclear. Non-coding DNA sequences constitute the majority of peripheral heterochromatin, which has been hypothesized to be the genome's 'bodyguard' against DNA damage from chemicals and radiation for almost four decades. The bodyguard protective function of peripheral heterochromatin in genome defense has been strengthened by the results from numerous recent studies, which are summarized in this review. These data have suggested that cells and/or organisms with a higher level of heterochromatin and more non-coding DNA sequences, including longer telomeric DNA and rDNAs, exhibit a lower frequency of DNA damage, higher radioresistance and longer lifespan after IR exposure. In addition, the majority of heterochromatin is peripherally located in the three-dimensional structure of genome organization. Therefore, the peripheral heterochromatin with non-coding DNA could play a protective role in genome defense against DNA damage from ionizing radiation by both absorbing the radicals from water radiolysis in the cytosol and reducing the energy of IR. However, the bodyguard protection by heterochromatin has been challenged by the observation that DNA damage is less frequently detected in peripheral heterochromatin than in euchromatin, which is inconsistent with the expectation and simulation results. Previous studies have also shown that the DNA damage in peripheral heterochromatin is rarely repaired and moves more quickly, broadly and outwardly to approach the nuclear pore complex (NPC). Additionally, it has been shown that extrachromosomal circular DNAs (eccDNAs) are formed in the nucleus, highly detectable in the cytoplasm (particularly under stress conditions) and shuttle between the nucleus and the cytoplasm. Based on these studies, this review speculates that the sites of DNA damage in peripheral heterochromatin could occur more

  17. Advances in SCA and RF-DNA Fingerprinting Through Enhanced Linear Regression Attacks and Application of Random Forest Classifiers

    DTIC Science & Technology

    2014-09-18

    ADVANCES IN SCA AND RF- DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS AND APPLICATION OF RANDOM FOREST CLASSIFIERS DISSERTATION Hiren...SCA AND RF- DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS AND APPLICATION OF RANDOM FOREST CLASSIFIERS DISSERTATION Presented to the...APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED AFIT-ENG-DS-14-S-03 ADVANCES IN SCA AND RF- DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS

  18. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  19. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  20. Classifying Force Spectroscopy of DNA Pulling Measurements Using Supervised and Unsupervised Machine Learning Methods.

    PubMed

    Karatay, Durmus U; Zhang, Jie; Harrison, Jeffrey S; Ginger, David S

    2016-04-25

    Dynamic force spectroscopy (DFS) measurements on biomolecules typically require classifying thousands of repeated force spectra prior to data analysis. Here, we study classification of atomic force microscope-based DFS measurements using machine-learning algorithms in order to automate selection of successful force curves. Notably, we collect a data set that has a testable positive signal using photoswitch-modified DNA before and after illumination with UV (365 nm) light. We generate a feature set consisting of six properties of force-distance curves to train supervised models and use principal component analysis (PCA) for an unsupervised model. For supervised classification, we train random forest models for binary and multiclass classification of force-distance curves. Random forest models predict successful pulls with an accuracy of 94% and classify them into five classes with an accuracy of 90%. The unsupervised method using Gaussian mixture models (GMM) reaches an accuracy of approximately 80% for binary classification.

  1. Nonextensive statistical approach to non-coding human DNA

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.; Tirnakli, U.

    2008-04-01

    We use q-exponential distributions, which maximize the nonextensive entropy Sq (defined as Sq≡(1-∑ipiq)/(q-1)), to study the size distributions of non-coding DNA (including introns and intergenic regions) in all human chromosomes. We show that the value of the exponent q describing the non-coding size distributions is similar for all chromosomes and varies between 2≤q≤2.3 with the exception of chromosomes X and Y.

  2. Within- and Cross-Participant Classifiers Reveal Different Neural Coding of Information

    PubMed Central

    Clithero, John A.; Smith, David V.; Carter, R. McKell; Huettel, Scott A.

    2010-01-01

    Analyzing distributed patterns of brain activation using multivariate pattern analysis (MVPA) has become a popular approach for using functional magnetic resonance imaging (fMRI) data to predict mental states. While the majority of studies currently build separate classifiers for each participant in the sample, in principle a single classifier can be derived from and tested on data from all participants. These two approaches, within- and cross-participant classification, rely on potentially different sources of variability and thus may provide distinct information about brain function. Here, we used both approaches to identify brain regions that contain information about passively-received monetary rewards (i.e., images of currency that influenced participant payment) and social rewards (i.e., images of human faces). Our within-participant analyses implicated regions in the ventral visual processing stream – including fusiform gyrus and primary visual cortex – and ventromedial prefrontal cortex (VMPFC). Two key results indicate these regions may contain statistically discriminable patterns that contain different informational representations. First, cross-participant analyses implicated additional brain regions, including striatum and anterior insula. The cross-participant analyses also revealed systematic changes in predictive power across brain regions, with the pattern of change consistent with the functional properties of regions. Second, individual differences in classifier performance in VMPFC were related to individual differences in preferences between our two reward modalities. We interpret these results as reflecting a distinction between patterns reflecting participant-specific functional organization and those indicating aspects of brain organization that generalize across individuals. PMID:20347995

  3. Coding-complete sequencing classifies parrot bornavirus 5 into a novel virus species.

    PubMed

    Marton, Szilvia; Bányai, Krisztián; Gál, János; Ihász, Katalin; Kugler, Renáta; Lengyel, György; Jakab, Ferenc; Bakonyi, Tamás; Farkas, Szilvia L

    2015-11-01

    In this study, we determined the sequence of the coding region of an avian bornavirus detected in a blue-and-yellow macaw (Ara ararauna) with pathological/histopathological changes characteristic of proventricular dilatation disease. The genomic organization of the macaw bornavirus is similar to that of other bornaviruses, and its nucleotide sequence is nearly identical to the available partial parrot bornavirus 5 (PaBV-5) sequences. Phylogenetic analysis showed that these strains formed a monophyletic group distinct from other mammalian and avian bornaviruses and in calculations performed with matrix protein coding sequences, the PaBV-5 and PaBV-6 genotypes formed a common cluster, suggesting that according to the recently accepted classification system for bornaviruses, these two genotypes may belong to a new species, provisionally named Psittaciform 2 bornavirus.

  4. Quantitative Profiling of Peptides from RNAs classified as non-coding

    PubMed Central

    Prabakaran, Sudhakaran; Hemberg, Martin; Chauhan, Ruchi; Winter, Dominic; Tweedie-Cullen, Ry Y.; Dittrich, Christian; Hong, Elizabeth; Gunawardena, Jeremy; Steen, Hanno; Kreiman, Gabriel; Steen, Judith A.

    2014-01-01

    Only a small fraction of the mammalian genome codes for messenger RNAs destined to be translated into proteins, and it is generally assumed that a large portion of transcribed sequences - including introns and several classes of non-coding RNAs (ncRNAs) do not give rise to peptide products. A systematic examination of translation and physiological regulation of ncRNAs has not been conducted. Here, we use computational methods to identify the products of non-canonical translation in mouse neurons by analyzing unannotated transcripts in combination with proteomic data. This study supports the existence of non-canonical translation products from both intragenic and extragenic genomic regions, including peptides derived from anti-sense transcripts and introns. Moreover, the studied novel translation products exhibit temporal regulation similar to that of proteins known to be involved in neuronal activity processes. These observations highlight a potentially large and complex set of biologically regulated translational events from transcripts formerly thought to lack coding potential. PMID:25403355

  5. Diversity and Recombination of Dispersed Ribosomal DNA and Protein Coding Genes in Microsporidia

    PubMed Central

    Ironside, Joseph Edward

    2013-01-01

    Microsporidian strains are usually classified on the basis of their ribosomal DNA (rDNA) sequences. Although rDNA occurs as multiple copies, in most non-microsporidian species copies within a genome occur as tandem arrays and are homogenised by concerted evolution. In contrast, microsporidian rDNA units are dispersed throughout the genome in some species, and on this basis are predicted to undergo reduced concerted evolution. Furthermore many microsporidian species appear to be asexual and should therefore exhibit reduced genetic diversity due to a lack of recombination. Here, DNA sequences are compared between microsporidia with different life cycles in order to determine the effects of concerted evolution and sexual reproduction upon the diversity of rDNA and protein coding genes. Comparisons of cloned rDNA sequences between microsporidia of the genus Nosema with different life cycles provide evidence of intragenomic variability coupled with strong purifying selection. This suggests a birth and death process of evolution. However, some concerted evolution is suggested by clustering of rDNA sequences within species. Variability of protein-coding sequences indicates that considerable intergenomic variation also occurs between microsporidian cells within a single host. Patterns of variation in microsporidian DNA sequences indicate that additional diversity is generated by intragenomic and/or intergenomic recombination between sequence variants. The discovery of intragenomic variability coupled with strong purifying selection in microsporidian rRNA sequences supports the hypothesis that concerted evolution is reduced when copies of a gene are dispersed rather than repeated tandemly. The presence of intragenomic variability also renders the use of rDNA sequences for barcoding microsporidia questionable. Evidence of recombination in the single-copy genes of putatively asexual microsporidia suggests that these species may undergo cryptic sexual reproduction, a

  6. Diversity and recombination of dispersed ribosomal DNA and protein coding genes in microsporidia.

    PubMed

    Ironside, Joseph Edward

    2013-01-01

    Microsporidian strains are usually classified on the basis of their ribosomal DNA (rDNA) sequences. Although rDNA occurs as multiple copies, in most non-microsporidian species copies within a genome occur as tandem arrays and are homogenised by concerted evolution. In contrast, microsporidian rDNA units are dispersed throughout the genome in some species, and on this basis are predicted to undergo reduced concerted evolution. Furthermore many microsporidian species appear to be asexual and should therefore exhibit reduced genetic diversity due to a lack of recombination. Here, DNA sequences are compared between microsporidia with different life cycles in order to determine the effects of concerted evolution and sexual reproduction upon the diversity of rDNA and protein coding genes. Comparisons of cloned rDNA sequences between microsporidia of the genus Nosema with different life cycles provide evidence of intragenomic variability coupled with strong purifying selection. This suggests a birth and death process of evolution. However, some concerted evolution is suggested by clustering of rDNA sequences within species. Variability of protein-coding sequences indicates that considerable intergenomic variation also occurs between microsporidian cells within a single host. Patterns of variation in microsporidian DNA sequences indicate that additional diversity is generated by intragenomic and/or intergenomic recombination between sequence variants. The discovery of intragenomic variability coupled with strong purifying selection in microsporidian rRNA sequences supports the hypothesis that concerted evolution is reduced when copies of a gene are dispersed rather than repeated tandemly. The presence of intragenomic variability also renders the use of rDNA sequences for barcoding microsporidia questionable. Evidence of recombination in the single-copy genes of putatively asexual microsporidia suggests that these species may undergo cryptic sexual reproduction, a

  7. Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes

    DTIC Science & Technology

    2007-10-01

    reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07

  8. Structural Code for DNA Recognition Revealed in Crystal Structures of Papillomavirus E2-DNA Targets

    NASA Astrophysics Data System (ADS)

    Rozenberg, Haim; Rabinovich, Dov; Frolow, Felix; Hegde, Rashmi S.; Shakked, Zippora

    1998-12-01

    Transcriptional regulation in papillomaviruses depends on sequence-specific binding of the regulatory protein E2 to several sites in the viral genome. Crystal structures of bovine papillomavirus E2 DNA targets reveal a conformational variant of B-DNA characterized by a roll-induced writhe and helical repeat of 10.5 bp per turn. A comparison between the free and the protein-bound DNA demonstrates that the intrinsic structure of the DNA regions contacted directly by the protein and the deformability of the DNA region that is not contacted by the protein are critical for sequence-specific protein/DNA recognition and hence for gene-regulatory signals in the viral system. We show that the selection of dinucleotide or longer segments with appropriate conformational characteristics, when positioned at correct intervals along the DNA helix, can constitute a structural code for DNA recognition by regulatory proteins. This structural code facilitates the formation of a complementary protein-DNA interface that can be further specified by hydrogen bonds and nonpolar interactions between the protein amino acids and the DNA bases.

  9. Superimposed Code Theoretic Analysis of Deoxyribonucleic Acid (DNA) Codes and DNA Computing

    DTIC Science & Technology

    2010-01-01

    hybridization that occurs between a DNA strand and its Watson - Crick complement can be used to perform mathematical computation. This research addresses how the...are 5′→3′ and strands with strikethrough are 3′→5′. A dsDNA duplex formed between a strand and its reverse complement is called a Watson - Crick (WC...3’ 5’ 3’ 5’TACGCGACTTTC3’ 5’GAAAGTCGCGTA3’ ATCAAACGATGC GCATCGTTTGAT Watson Crick (WC) Duplexes TACGCGACTTTC

  10. Extra-coding RNAs regulate neuronal DNA methylation dynamics

    PubMed Central

    Savell, Katherine E.; Gallus, Nancy V. N.; Simon, Rhiana C.; Brown, Jordan A.; Revanna, Jasmin S.; Osborn, Mary Katherine; Song, Esther Y.; O'Malley, John J.; Stackhouse, Christian T.; Norvil, Allison; Gowher, Humaira; Sweatt, J. David; Day, Jeremy J.

    2016-01-01

    Epigenetic mechanisms such as DNA methylation are essential regulators of the function and information storage capacity of neurons. DNA methylation is highly dynamic in the developing and adult brain, and is actively regulated by neuronal activity and behavioural experiences. However, it is presently unclear how methylation status at individual genes is targeted for modification. Here, we report that extra-coding RNAs (ecRNAs) interact with DNA methyltransferases and regulate neuronal DNA methylation. Expression of ecRNA species is associated with gene promoter hypomethylation, is altered by neuronal activity, and is overrepresented at genes involved in neuronal function. Knockdown of the Fos ecRNA locus results in gene hypermethylation and mRNA silencing, and hippocampal expression of Fos ecRNA is required for long-term fear memory formation in rats. These results suggest that ecRNAs are fundamental regulators of DNA methylation patterns in neuronal systems, and reveal a promising avenue for therapeutic targeting in neuropsychiatric disease states. PMID:27384705

  11. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  12. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  13. Imperfect DNA mirror repeats in E. coli TnsA and other protein-coding DNA.

    PubMed

    Lang, Dorothy M

    2005-09-01

    DNA imperfect mirror repeats (DNA-IMRs) are ubiquitous in protein-coding DNA. However, they overlap and often have different centers of symmetry, making it difficult to evaluate their relationship to each other and to specific DNA and protein motifs and structures. This paper describes a systematic method of determining a hierarchy for DNA-IMRs and evaluates their relationship to protein structural elements (PSEs)--helices, turns and beta-sheets. DNA-IMRs are identifed by two different methods--DNA-IMRs terminated by reverse dinucleotides (rd-IMRs) and DNA-IMRs terminated by a single (mono) matching nucleotide (m-IMRs). Both rd-IMRs and m-IMRs are evaluated in 17 proteins, and illustrated in detail for TnsA. For each of the proteins, Fisher's exact test (FET) is used to measure the coincidence between the terminal dinucleotides of rd-IMRs and the terminal amino acids of individual PSEs. A significant correlation over a span of about 3 nt was found for each protein. The correlation is robust and for most genes, all rd-IMRs16 nt contain approximately 88% of the potential functional motifs. The protein translation of the longest rd- and m-IMRs span sequences important to the protein's structure and function. In all 17 proteins studied, the population of rd-IMRs is substantially less than the expected number and the population of m-IMRs greater than the expected number, indicating strong selective pressures. The association of rd-IMRs with PSEs restricts their spatial distribution, and therefore, their number. The greater than predicted number of m-IMRs indicates that DNA symmetry exists throughout the entire protein-coding region and may stabilize the sequence.

  14. Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: Evolutionary footprints of RNA silencing

    USDA-ARS?s Scientific Manuscript database

    Pyknons are non-random sequence patterns significantly repeated throughout non-coding genomic DNA that also appear at least once among genes. They are interesting because they portend an unforeseen connection between coding and non-coding DNA. Pyknons have only been discovered in the human genome,...

  15. Evolutionary analysis of DNA-protein-coding regions based on a genetic code cube metric.

    PubMed

    Sanchez, Robersy

    2014-01-01

    The right estimation of the evolutionary distance between DNA or protein sequences is the cornerstone of the current phylogenetic analysis based on distance methods. Herein, it is demonstrated that the Manhattan distance (dw), weighted by the evolutionary importance of the nucleotide bases in the codon, is a naturally derived metric in the standard genetic code cube inserted into the three-dimensional Euclidean space. Based on the application of distance dw, a novel evolutionary model is proposed. This model includes insertion/deletion mutations that are very important for cancer studies, but usually discarded in classical evolutionary models. In this study, the new evolutionary model was applied to the phylogenetic analysis of the DNA protein-coding regions of 13 mammal mitochondrial genomes and of four cancer genetic- susceptibility genes (ATM, BRCA1, BRCA2 and p53) from nine mammals. The opossum (a marsupial) was used as an out-group species for both sets of sequences. The new evolutionary model yielded the correct topology, while the current models failed to separate the evolutionarily distant species of mouse and opossum.

  16. Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.

    2006-03-01

    We study the primary DNA structure of four of the most completely sequenced human chromosomes (including chromosome 19 which is the most dense in coding), using non-extensive statistics. We show that the exponents governing the spatial decay of the coding size distributions vary between 5.2 ≤r ≤5.7 for the short scales and 1.45 ≤q ≤1.50 for the large scales. On the contrary, the exponents governing the spatial decay of the non-coding size distributions in these four chromosomes, take the values 2.4 ≤r ≤3.2 for the short scales and 1.50 ≤q ≤1.72 for the large scales. These results, in particular the values of the tail exponent q, indicate the existence of correlations in the coding and non-coding size distributions with tendency for higher correlations in the non-coding DNA.

  17. In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy.

    PubMed

    Zhang, Jin; Zhang, Wenqing; Yang, Huijie

    2016-01-01

    Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

  18. A study of oligonucleotide occurrence distributions in DNA coding segments.

    PubMed

    Castrignanò, T; Colosimo, A; Morante, S; Parisi, V; Rossi, G C

    1997-02-21

    In this paper we present a general strategy designed to study the occurrence frequency distributions of oligonucleotides in DNA coding segments and to deal with the problem of detecting possible patterns of genomic compositional inhomogeneities and disuniformities. Identifying specific tendencies or peculiar deviations in the distributions of the effective occurrence frequencies of oligonucleotides, with respect to what can be a priori expected, is of the greatest importance in biology. Differences between expected and actual distributions may in fact suggest or confirm the existence of specific biological mechanisms related to them. Similarly, a marked deviation in the occurrence frequency of an oligonucleotide may suggest that it belongs to the class of so-called "DNA signal (target) sequences". The approach we have elaborated is innovative in various aspects. Firstly, the analysis of the genomic data is carried out in the light of the observation that the distribution of the four nucleotides along the coding regions of the genoma is biased by the existence of a well-defined "reading frame". Secondly, the "experimental" numbers found by counting the occurrences of the various oligonucleotide sequences are appropriately corrected for the many kinds of mistakes and redundancies present in the available genetic Data Bases. A methodologically significant further improvement of our approach over the existing searching strategies is represented by the fact that, in order to decide whether or not the (corrected) "experimental" value of the occurrence frequency of a given oligonucleotide is within statistical expectations, a measure of the strength of the selective pressure, having acted on it in the course of the evolution, is assigned to the sequence, in a way that takes into account both the value of the "experimental" occurrence frequency of the sequence and the magnitude of the probability that this number might be the result of statistical fluctuations. If the

  19. Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

    PubMed Central

    Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

    2005-01-01

    We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308

  20. Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions.

    PubMed Central

    Choo, Y; Klug, A

    1994-01-01

    In the preceding paper [Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. USA 91, 11163-11167], we showed how selections from a library of zinc fingers displayed on phage yielded fingers able to bind to a number of DNA triplets. Here, we describe a technique to deal efficiently with the converse problem--namely, the selection of a DNA binding site for a given zinc finger. This is done by screening against libraries of DNA triplet binding sites randomized in two positions but having one base fixed in the third position. The technique is applied here to determine the specificity of fingers previously selected by phage display. We find that some of these fingers are able to specify a unique base in each position of the cognate triplet. This is further illustrated by examples of fingers which can discriminate between closely related triplets as measured by their respective equilibrium dissociation constants. Comparing the amino acid sequences of fingers which specify a particular base in a triplet, we infer that in most instances, sequence-specific binding of zinc fingers to DNA can be achieved by using a small set of amino acid-nucleotide base contacts amenable to a code. Images PMID:7972028

  1. An Integrated Prognostic Classifier for Stage I Lung Adenocarcinoma based on mRNA, microRNA and DNA Methylation Biomarkers

    PubMed Central

    Robles, Ana I.; Arai, Eri; Mathé, Ewy A.; Okayama, Hirokazu; Schetter, Aaron J.; Brown, Derek; Petersen, David; Bowman, Elise D.; Noro, Rintaro; Welsh, Judith A.; Edelman, Daniel C.; Stevenson, Holly S.; Wang, Yonghong; Tsuchiya, Naoto; Kohno, Takashi; Skaug, Vidar; Mollerup, Steen; Haugen, Aage; Meltzer, Paul S.; Yokota, Jun; Kanai, Yae

    2015-01-01

    Introduction Up to 30% Stage I lung cancer patients suffer recurrence within 5 years of curative surgery. We sought to improve existing protein-coding gene and microRNA expression prognostic classifiers by incorporating epigenetic biomarkers. Methods Genome-wide screening of DNA methylation and pyrosequencing analysis of HOXA9 promoter methylation were performed in two independently collected cohorts of Stage I lung adenocarcinoma. The prognostic value of HOXA9 promoter methylation alone and in combination with mRNA and miRNA biomarkers was assessed by Cox regression and Kaplan-Meier survival analysis in both cohorts. Results Promoters of genes marked by Polycomb in Embryonic Stem Cells were methylated de novo in tumors and identified patients with poor prognosis. The HOXA9 locus was methylated de novo in Stage I tumors (P < 0.0005). High HOXA9 promoter methylation was associated with worse cancer-specific survival (Hazard Ratio [HR], 2.6; P = 0.02) and recurrence-free survival (HR, 3.0; P = 0.01), and identified high-risk patients in stratified analysis of Stage IA and IB. Four protein-coding gene (XPO1, BRCA1, HIF1α, DLC1), miR-21 expression and HOXA9 promoter methylation were each independently associated with outcome (HR, 2.8; P = 0.002; HR, 2.3; P = 0.01; and HR, 2.4; P = 0.005, respectively), and, when combined, identified high-risk, therapy naïve, Stage I patients (HR, 10.2; P = 3x10−5). All associations were confirmed in two independently collected cohorts. Conclusion A prognostic classifier comprising three types of genomic and epigenomic data may help guide the postoperative management of Stage I lung cancer patients at high risk of recurrence. PMID:26134223

  2. A DNA methylation classifier of cervical precancer based on human papillomavirus and human genes.

    PubMed

    Brentnall, Adam R; Vasiljević, Nataša; Scibior-Bentkowska, Dorota; Cadman, Louise; Austin, Janet; Szarewski, Anne; Cuzick, Jack; Lorincz, Attila T

    2014-09-15

    Testing for high-risk (hr) types of human papillomavirus (HPV) is highly sensitive as a screening test of high-grade cervical intraepithelial neoplastic (CIN2/3) disease, the precursor of cervical cancer. However, it has a relatively low specificity. Our objective was to develop a prediction rule with a higher specificity, using combinations of human and HPV DNA methylation. Exfoliated cervical specimens from colposcopy-referral cohorts in London were analyzed for DNA methylation levels by pyrosequencing in the L1 and L2 regions of HPV16, HPV18, HPV31 and human genes EPB41L3, DPYS and MAL. Samples from 1,493 hrHPV-positive women were assessed and of these 556 were found to have CIN2/3 at biopsy; 556 tested positive for HPV16 (323 CIN2/3), 201 for HPV18 (73 CIN2/3) and 202 for HPV31 (98 CIN2/3). The prediction rule included EPB41L3 and HPV and had area under curve 0.80 (95% CI 0.78-0.82). For 90% sensitivity, specificity was 36% (33-40) and positive predictive value (PPV) was 46% (43-48). By HPV type, 90% sensitivity corresponded to the following specificities and PPV, respectively: HPV16, 38% (32-45) and 67% (63-71); HPV18, 53% (45-62) and 52% (45-59); HPV31, 39% (31-49) and 58% (51-65); HPV16, 18 or 31, 44% (40-49) and 62% (59-65) and other hrHPV 17% (14-21) and 21% (18-24). We conclude that a methylation assay in hrHPV-positive women might improve PPV with minimal sensitivity loss. © 2014 The Authors. UICC.

  3. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find

  4. Non-coding RNAs: an emerging player in DNA damage response.

    PubMed

    Zhang, Chunzhi; Peng, Guang

    2015-01-01

    Non-coding RNAs play a crucial role in maintaining genomic stability which is essential for cell survival and preventing tumorigenesis. Through an extensive crosstalk between non-coding RNAs and the canonical DNA damage response (DDR) signaling pathway, DDR-induced expression of non-coding RNAs can provide a regulatory mechanism to accurately control the expression of DNA damage responsive genes in a spatio-temporal manner. Mechanistically, DNA damage alters expression of a variety of non-coding RNAs at multiple levels including transcriptional regulation, post-transcriptional regulation, and RNA degradation. In parallel, non-coding RNAs can directly regulate cellular processes involved in DDR by altering expression of their targeting genes, with a particular emphasis on miRNAs and lncRNAs. MiRNAs are required for almost every aspect of cellular responses to DNA damage, including sensing DNA damage, transducing damage signals, repairing damaged DNA, activating cell cycle checkpoints, and inducing apoptosis. As for lncRNAs, they control transcription of DDR relevant gene by four different regulatory models, including signal, decoy, guide, and scaffold. In addition, we also highlight potential clinical applications of non-coding RNAs as biomarkers and therapeutic targets for anti-cancer treatments using DNA-damaging agents including radiation and chemotherapy. Although tremendous advances have been made to elucidate the role of non-coding RANs in genome maintenance, many key questions remain to be answered including mechanistically how non-coding RNA pathway and DNA damage response pathway is coordinated in response to genotoxic stress. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Sequences encoding identical peptides for the analysis and manipulation of coding DNA

    PubMed Central

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  6. Sequences encoding identical peptides for the analysis and manipulation of coding DNA.

    PubMed

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression.

  7. Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

    PubMed

    Suyama, Mikita; Lathe, Warren C; Bork, Peer

    2005-10-10

    We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.

  8. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA.

    PubMed

    Schmitz, Jonathan F; Bornberg-Bauer, Erich

    2017-01-01

    Over the last few years, there has been an increasing amount of evidence for the de novo emergence of protein-coding genes, i.e. out of non-coding DNA. Here, we review the current literature and summarize the state of the field. We focus specifically on open questions and challenges in the study of de novo protein-coding genes such as the identification and verification of de novo-emerged genes. The greatest obstacle to date is the lack of high-quality genomic data with very short divergence times which could help precisely pin down the location of origin of a de novo gene. We conclude that, while there is plenty of evidence from a genetics perspective, there is a lack of functional studies of bona fide de novo genes and almost no knowledge about protein structures and how they come about during the emergence of de novo protein-coding genes. We suggest that future studies should concentrate on the functional and structural characterization of de novo protein-coding genes as well as the detailed study of the emergence of functional de novo protein-coding genes.

  9. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA

    PubMed Central

    Schmitz, Jonathan F; Bornberg-Bauer, Erich

    2017-01-01

    Over the last few years, there has been an increasing amount of evidence for the de novo emergence of protein-coding genes, i.e. out of non-coding DNA. Here, we review the current literature and summarize the state of the field. We focus specifically on open questions and challenges in the study of de novo protein-coding genes such as the identification and verification of de novo-emerged genes. The greatest obstacle to date is the lack of high-quality genomic data with very short divergence times which could help precisely pin down the location of origin of a de novo gene. We conclude that, while there is plenty of evidence from a genetics perspective, there is a lack of functional studies of bona fide de novo genes and almost no knowledge about protein structures and how they come about during the emergence of de novo protein-coding genes. We suggest that future studies should concentrate on the functional and structural characterization of de novo protein-coding genes as well as the detailed study of the emergence of functional de novo protein-coding genes. PMID:28163910

  10. Cloning and expression of cDNA coding for bouganin.

    PubMed

    den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo

    2002-03-01

    Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.

  11. RNA-DNA sequence differences spell genetic code ambiguities

    PubMed Central

    Nielsen, Michael L.

    2011-01-01

    A recent paper in Science by Li et al. 20111 reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized.  PMID:22567189

  12. Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier

    PubMed Central

    Joseph, Sandeep J.; Robbins, Kelly R.; Zhang, Wensheng; Rekaya, Romdhane

    2010-01-01

    Multi-class cancer classification based on microarray data is described. A generalized output-coding scheme based on One Versus One (OVO) combined with Latent Variable Model (LVM) is used. Results from the proposed One Versus One (OVO) outputcoding strategy is compared with the results obtained from the generalized One Versus All (OVA) method and their efficiencies of using them for multi-class tumor classification have been studied. This comparative study was done using two microarray gene expression data: Global Cancer Map (GCM) dataset and brain cancer (BC) dataset. Primary feature selection was based on fold change and penalized t-statistics. Evaluation was conducted with varying feature numbers. The OVO coding strategy worked quite well with the BC data, while both OVO and OVA results seemed to be similar for the GCM data. The selection of output coding methods for combining binary classifiers for multi-class tumor classification depends on the number of tumor types considered, the discrepancies between the tumor samples used for training as well as the heterogeneity of expression within the cancer subtypes used as training data. PMID:20458360

  13. TOWARDS A PROBABILISTIC RECOGNITION CODE FOR PROTEIN-DNA INTERACTIONS

    SciTech Connect

    P. BENOS; ET AL

    2000-09-01

    We are investigating the rules that govern protein-DNA interactions, using a statistical mechanics based formalism that is related to the Boltzmann Machine of the neural net literature. Our approach is data-driven, in which probabilistic algorithms are used to model protein-DNA interactions, given SELEX and phage data as input. Under the ''one-to-one'' model for interactions (i.e. one amino acid contacts one base), we can successfully identify the wild-type binding sites of EGR and MIG protein families. The predictions using our method are the same or better than that of methods existing in the literature, however our methodology offers the potential to capitalize in quantitative detail on more data as it becomes available.

  14. A novel Lie algebra of the genetic code over the Galois field of four DNA bases.

    PubMed

    Sánchez, Robersy; Grau, Ricardo; Morgado, Eberto

    2006-07-01

    Starting from the four DNA bases order in the Boolean lattice, a novel Lie Algebra of the genetic code is proposed. Here, the main partitions of the genetic code table were obtained as equivalent classes of quotient spaces of the genetic code vector space over the Galois field of the four DNA bases. The new algebraic structure shows strong connections among algebraic relationships, codon assignments and physicochemical properties of amino acids. Moreover, a distance defined between codons expresses a physicochemical meaning. It was also noticed that the distance between wild type and mutant codons tends to be small in mutational variants of four genes: human phenylalanine hydroxylase, human beta-globin, HIV-1 protease and HIV-1 reverse transcriptase. These results strongly suggest that deterministic rules in genetic code origin must be involved.

  15. Statistical analysis of nucleotide runs in coding and noncoding DNA sequences.

    PubMed

    Sprizhitsky YuA; Nechipurenko YuD; Alexandrov, A A; Volkenstein, M V

    1988-10-01

    A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.

  16. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

    PubMed Central

    Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

    2015-01-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID

  17. [Cloning and insertion mutagenesis of DNA fragment coding for the luminescent system of Photobacterium leiognathi].

    PubMed

    Ptitsyn, L R; Gurevich, V B; Barsanova, T G; Shenderov, A N; Khaĭkinson, M Ia

    1988-10-01

    Fragments of DNA, obtained from the luminescent bacterium Photobacterium leiognathi and inserted into the plasmid pBR322, were found to code for the luminescence expressed in E. coli cells. The genetic functions necessary for light production in E. coli are localized on a DNA fragment of about 7 kbp. The insertion mutagenesis was used to define the luminescence functions encoded by the hybrid plasmid.

  18. Coding and noncoding plastid DNA in palm systematics.

    PubMed

    Asmussen, C B; Chase, M W

    2001-06-01

    Plastid DNA sequences evolve slowly in palms but show that the family is monophyletic and highly divergent relative to other major monocot clades. It is therefore difficult to place the root within the palms because faster evolving, length-variable sequences cannot be aligned with outgroup monocots, and length-conserved regions have been thought to give too few characters to resolve basal nodes. To solve this problem, we combined 94 ingroup and 24 outgroup sequences from the length-conserved rbcL gene with ingroup and alignable outgroup sequences from noncoding rps16 intron and trnL-trnF regions. The separate rps16 intron and trnL-trnF region contained about the same number of variable sites (autapomorphies not included) as rbcL, but gave higher retention indices and more clades with bootstrap support. In general, the strict consensus tree based on combined rbcL, rps16 intron, and trnL-trnF data showed more resolution towards the base of the palm family than previous hypotheses of relationships of the Arecaceae. An important result was the position of subfamily Calamoideae as sister to the rest of the palms, but this received <50% bootstrap support. Another result of systematic significance was the indication that subfamily Phytelephantoideae is related to two tribes from subfamily Ceroxyloideae, Cyclospatheae and Ceroxyleae.

  19. DNA methylation patterns of protein-coding genes and long non-coding RNAs in males with schizophrenia.

    PubMed

    Liao, Qi; Wang, Yunliang; Cheng, Jia; Dai, Dongjun; Zhou, Xingyu; Zhang, Yuzheng; Li, Jinfeng; Yin, Honglei; Gao, Shugui; Duan, Shiwei

    2015-11-01

    Schizophrenia (SCZ) is one of the most complex mental illnesses affecting ~1% of the population worldwide. SCZ pathogenesis is considered to be a result of genetic as well as epigenetic alterations. Previous studies have aimed to identify the causative genes of SCZ. However, DNA methylation of long non-coding RNAs (lncRNAs) involved in SCZ has not been fully elucidated. In the present study, a comprehensive genome-wide analysis of DNA methylation was conducted using samples from two male patients with paranoid and undifferentiated SCZ, respectively. Methyl-CpG binding domain protein-enriched genome sequencing was used. In the two patients with paranoid and undifferentiated SCZ, 1,397 and 1,437 peaks were identified, respectively. Bioinformatic analysis demonstrated that peaks were enriched in protein-coding genes, which exhibited nervous system and brain functions. A number of these peaks in gene promoter regions may affect gene expression and, therefore, influence SCZ-associated pathways. Furthermore, 7 and 20 lncRNAs, respectively, in the Refseq database were hypermethylated. According to the lncRNA dataset in the NONCODE database, ~30% of intergenic peaks overlapped with novel lncRNA loci. The results of the present study demonstrated that aberrant hypermethylation of lncRNA genes may be an important epigenetic factor associated with SCZ. However, further studies using larger sample sizes are required.

  20. Setting standards for DNA banks: toward a model code of conduct.

    PubMed

    McEwen, J E; Reilly, P R

    1996-01-01

    As genomic research proliferates, DNA banking will become more common. In research, samples will be banked largely in an effort to find and clone genes that predispose to disease. Commercially oriented banks, those that offer services to families, may also become more common. These entities will hold sensitive information. DNA banking is not yet regulated. We argue here that new laws are not needed at this time to regulate DNA banking. We suggest an approach that relies on a professional code of conduct and draws on principles of disclosure inherent to the process used in obtaining informed consent. In addition to suggesting 12 specific recommendations for the code of conduct, we suggest that items should be included in depositor's agreements. We offer a rationale for our suggestions.

  1. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

    PubMed

    Siddharthan, Rahul

    2006-03-16

    Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  2. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy.

    PubMed

    Miller-Delaney, Suzanne F C; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C; Bray, Isabella M; Reynolds, James P; Gwinn, Ryder; Stallings, Raymond L; Henshall, David C

    2015-03-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. © The

  3. A molecular bar-coded DNA repair resource for pooled toxicogenomic screens.

    PubMed

    Rooney, John P; Patil, Ashish; Zappala, Maria R; Conklin, Douglas S; Cunningham, Richard P; Begley, Thomas J

    2008-11-01

    DNA damage from exogenous and endogenous sources can promote mutations and cell death. Fortunately, cells contain DNA repair and damage signaling pathways to reduce the mutagenic and cytotoxic effects of DNA damage. The identification of specific DNA repair proteins and the coordination of DNA repair pathways after damage has been a central theme to the field of genetic toxicology and we have developed a tool for use in this area. We have produced 99 molecular bar-coded Escherichia coli gene-deletion mutants specific to DNA repair and damage signaling pathways, and each bar-coded mutant can be tracked in pooled format using bar-code specific microarrays. Our design adapted bar-codes developed for the Saccharomyces cerevisiae gene-deletion project, which allowed us to utilize an available microarray product for pooled gene-exposure studies. Microarray-based screens were used for en masse identification of individual mutants sensitive to methyl methanesulfonate (MMS). As expected, gene-deletion mutants specific to direct, base excision, and recombinational DNA repair pathways were identified as MMS-sensitive in our pooled assay, thus validating our resource. We have demonstrated that molecular bar-codes designed for S. cerevisiae are transferable to E. coli, and that they can be used with pre-existing microarrays to perform competitive growth experiments. Further, when comparing microarray to traditional plate-based screens both overlapping and distinct results were obtained, which is a novel technical finding, with discrepancies between the two approaches explained by differences in output measurements (DNA content versus cell mass). The microarray-based classification of Deltatag and DeltadinG cells as depleted after MMS exposure, contrary to plate-based methods, led to the discovery that Deltatag and DeltadinG cells show a filamentation phenotype after MMS exposure, thus accounting for the discrepancy. A novel biological finding is the observation that while

  4. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  5. Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1995-04-01

    This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)

  6. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  7. Correcting sequencing errors in DNA coding regions using a dynamic programming approach

    SciTech Connect

    Xu, Y.; Mural, R.J.; Uberbacher, E.C.

    1994-12-01

    This paper presents an algorithm for detecting and ``correcting`` sequencing errors that occur in DNA coding regions. The types of sequencing error addressed include insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of ``neutral`` bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. The authors have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. On a test set consisting of 68 Human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the ``corrected`` sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the ``corrupted`` sequences using standard GRAIL II method. The method uses a dynamic programming algorithm, and runs in time and space linear to the size of the input sequence.

  8. Specificity-Determining DNA Triplet Code for Positioning of Human Preinitiation Complex

    NASA Astrophysics Data System (ADS)

    Goldshtein, Matan; Lukatsky, David B.

    2017-05-01

    The notion that transcription factors bind DNA only through specific, consensus binding sites has been recently questioned. In a pioneering study by Pugh and Venters no specific consensus motif for the positioning of the human pre-initiation complex (PIC) has been identified. Here, we reveal that nonconsensus, statistical, DNA triplet code provides specificity for the positioning of the human PIC. In particular, we reveal a highly non-random, statistical pattern of repetitive nucleotide triplets that correlates with the genome-wide binding preferences of PIC measured by Chip-exo. We analyze the triplet enrichment and depletion near the transcription start site (TSS) and identify triplets that have the strongest effect on PIC-DNA nonconsensus binding. Our results constitute a proof-of-concept for a new design principle for protein-DNA recognition in the human genome, which can lead to a better mechanistic understanding of transcriptional regulation.

  9. A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes.

    PubMed

    Fongang, Bernard; Kong, Fanping; Negi, Surendra; Braun, Werner; Kudlicki, Andrzej

    2016-10-14

    The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated.

  10. Specificity-Determining DNA Triplet Code for Positioning of Human Preinitiation Complex.

    PubMed

    Goldshtein, Matan; Lukatsky, David B

    2017-05-23

    The notion that transcription factors bind DNA only through specific, consensus binding sites has been recently questioned. No specific consensus motif for the positioning of the human preinitiation complex (PIC) has been identified. Here, we reveal that nonconsensus, statistical, DNA triplet code provides specificity for the positioning of the human PIC. In particular, we reveal a highly nonrandom, statistical pattern of repetitive nucleotide triplets that correlates with the genomewide binding preferences of PIC measured by Chip-exo. We analyze the triplet enrichment and depletion near the transcription start site and identify triplets that have the strongest effect on PIC-DNA nonconsensus binding. Using statistical mechanics, a random-binder model without fitting parameters, with genomic DNA sequence being the only input, we further validate that the nonconsensus nucleotide triplet code constitutes a key signature providing PIC binding specificity in the human genome. Our results constitute a proof-of-concept for, to our knowledge, a new design principle for protein-DNA recognition in the human genome, which can lead to a better mechanistic understanding of transcriptional regulation. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  11. A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes

    PubMed Central

    Fongang, Bernard; Kong, Fanping; Negi, Surendra; Braun, Werner; Kudlicki, Andrzej

    2016-01-01

    The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated. PMID:27739488

  12. Differentiating the Protein Coding and Noncoding RNA Segments of DNA Using Shannon Entropy

    NASA Astrophysics Data System (ADS)

    Mazaheri, P.; Shirazi, A. H.; Saeedi, N.; Reza Jafari, G.; Sahimi, Muhammad

    The complexity of DNA sequences is evaluated in order to differentiate between protein-coding and noncoding RNA segments. The method is based on computing the Shannon entropy of the sequences. By comparing the entropy of the original sequence with that of its shuffled one, we identify the source of the difference between the two segments and their relative contributions to the sequence. To demonstrate the method, the DNA sequences of the bacterium Clostridium difficile 630 (G + C = 29.1%) and Bdellovibrio bacteriovorus (G + C = 50.6%) are analyzed, which are representatives of bacteria with unbalanced and balanced nucleotide content, respectively. It is shown that in both bacteria, regardless of nucleotide content, ΔrS — the relative difference of the two entropies — is significantly greater in protein-coding regions, when compared with noncoding RNA segments.

  13. Long non-coding RNA PARTICLE bridges histone and DNA methylation.

    PubMed

    O'Leary, Valerie Bríd; Hain, Sarah; Maugg, Doris; Smida, Jan; Azimzadeh, Omid; Tapio, Soile; Ovsepian, Saak Victor; Atkinson, Michael John

    2017-05-11

    PARTICLE (Gene PARTICL- 'Promoter of MAT2A-Antisense RadiaTion Induced Circulating LncRNA) expression is transiently elevated following low dose irradiation typically encountered in the workplace and from natural sources. This long non-coding RNA recruits epigenetic silencers for cis-acting repression of its neighbouring Methionine adenosyltransferase 2A gene. It now emerges that PARTICLE operates as a trans-acting mediator of DNA and histone lysine methylation. Chromatin immunoprecipitation sequencing (ChIP-seq) and immunological evidence established elevated PARTICLE expression linked to increased histone 3 lysine 27 trimethylation. Live-imaging of dbroccoli-PARTICLE revealing its dynamic association with DNA methyltransferase 1 was confirmed by flow cytometry, immunoprecipitation and direct competitive binding interaction through electrophoretic mobility shift assay. Acting as a regulatory docking platform, the long non-coding RNA PARTICLE serves to interlink epigenetic modification machineries and represents a compelling innovative component necessary for gene silencing on a global scale.

  14. Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes (Postprint)

    DTIC Science & Technology

    2007-01-01

    reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07

  15. DNA methylation of miRNA coding sequences putatively associated with childhood obesity.

    PubMed

    Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A

    2017-02-01

    Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.

  16. Junk DNA and the long non-coding RNA twist in cancer genetics

    PubMed Central

    Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

    2015-01-01

    The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839

  17. HyDEn: a hybrid steganocryptographic approach for data encryption using randomized error-correcting DNA codes.

    PubMed

    Tulpan, Dan; Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach.

  18. HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

    PubMed Central

    Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392

  19. A molecular code dictates sequence-specific DNA recognition by homeodomains.

    PubMed Central

    Damante, G; Pellizzari, L; Esposito, G; Fogolari, F; Viglino, P; Fabbro, D; Tell, G; Formisano, S; Di Lauro, R

    1996-01-01

    Most homeodomains bind to DNA sequences containing the motif 5'-TAAT-3'. The homeodomain of thyroid transcription factor 1 (TTF-1HD) binds to sequences containing a 5'-CAAG-3' core motif, delineating a new mechanism for differential DNA recognition by homeodomains. We investigated the molecular basis of the DNA binding specificity of TTF-1HD by both structural and functional approaches. As already suggested by the three-dimensional structure of TTF-1HD, the DNA binding specificities of the TTF-1, Antennapedia and Engrailed homeodomains, either wild-type or mutants, indicated that the amino acid residue in position 54 is involved in the recognition of the nucleotide at the 3' end of the core motif 5'-NAAN-3'. The nucleotide at the 5' position of this core sequence is recognized by the amino acids located in position 6, 7 and 8 of the TTF-1 and Antennapedia homeodomains. These data, together with previous suggestions on the role of amino acids in position 50, indicate that the DNA binding specificity of homeodomains can be determined by a combinatorial molecular code. We also show that some specific combinations of the key amino acid residues involved in DNA recognition do not follow a simple, additive rule. Images PMID:8890172

  20. A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding

    NASA Astrophysics Data System (ADS)

    Jin, Xin; Nie, Rencan; Zhou, Dongming; Yao, Shaowen; Chen, Yanyan; Yu, Jiefu; Wang, Quan

    2016-11-01

    A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

  1. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  2. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  3. Classifying Motion.

    ERIC Educational Resources Information Center

    Duzen, Carl; And Others

    1992-01-01

    Presents a series of activities that utilizes a leveling device to classify constant and accelerated motion. Applies this classification system to uniform circular motion and motion produced by gravitational force. (MDH)

  4. Classifying Motion.

    ERIC Educational Resources Information Center

    Duzen, Carl; And Others

    1992-01-01

    Presents a series of activities that utilizes a leveling device to classify constant and accelerated motion. Applies this classification system to uniform circular motion and motion produced by gravitational force. (MDH)

  5. Bio-bar-code functionalized magnetic nanoparticle label for ultrasensitive flow injection chemiluminescence detection of DNA hybridization.

    PubMed

    Bi, Sai; Zhou, Hong; Zhang, Shusheng

    2009-10-07

    A signal amplification strategy based on bio-bar-code functionalized magnetic nanoparticles as labels holds promise to improve the sensitivity and detection limit of the detection of DNA hybridization and single-nucleotide polymorphisms by flow injection chemiluminescence assays.

  6. Characterization of the cDNA and gene coding for the biotin synthase of Arabidopsis thaliana.

    PubMed Central

    Weaver, L M; Yu, F; Wurtele, E S; Nikolau, B J

    1996-01-01

    Biotin, an essential cofactor, is synthesized de novo only by plants and some microbes. An Arabidopsis thaliana expressed sequence tag that shows sequence similarity to the carboxyl end of biotin synthase from Escherichia coli was used to isolate a near-full-length cDNA. This cDNA was shown to code for the Arabidopsis biotin synthase by its ability to complement a bioB mutant of E. coli. Site-specific mutagenesis indicates that residue threonine-173, which is highly conserved in biotin synthases, is important for catalytic competence of the enzyme. The primary sequence of the Arabidopsis biotin synthase is most similar to biotin synthases from E. coli, Serratia marcescens, and Saccharomyces cerevisiae (about 50% sequence identity) and more distantly related to the Bacillus sphaericus enzyme (33% sequence identity). The primary sequence of the amino terminus of the Arabidopsis biotin synthase may represent an organelle-targeting transit peptide. The single Arabidopsis gene coding for biotin synthase, BIO2, was isolated and sequenced. The biotin synthase coding sequence is interrupted by five introns. The gene sequence upstream of the translation start site has several unusual features, including imperfect palindromes and polypyrimidine sequences, which may function in the transcriptional regulation of the BIO2 gene. PMID:8819873

  7. Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors

    PubMed Central

    Liu, Jiajian; Stormo, Gary D.

    2008-01-01

    Motivation: Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C2H2 zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. Results: We present a context-dependent model for DNA–zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C2H2 zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA–zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA–zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. Availability:The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html Contact: stormo@ural.wustl.edu PMID:18586699

  8. DNA methylation patterns of protein coding genes and long noncoding RNAs in female schizophrenic patients.

    PubMed

    Liao, Qi; Wang, Yunliang; Cheng, Jia; Dai, Dongjun; Zhou, Xingyu; Zhang, Yuzheng; Gao, Shugui; Duan, Shiwei

    2015-02-01

    Schizophrenia (SCZ) is a complex mental disorder contributed by both genetic and epigenetic factors. Long noncoding RNAs (lncRNAs) was recently found playing an important regulatory role in mental disorders. However, little was known about the DNA methylation of lncRNAs, although numerous SCZ studies have been performed on genetic polymorphisms or epigenetic marks in protein coding genes. We presented a comprehensive genome wide DNA methylation study of both protein coding genes and lncRNAs in female patients with paranoid and undifferentiated SCZ. Using the methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq), 8,163 and 764 peaks were identified in paranoid and undifferentiated SCZ, respectively (p < 1 × 10-5). Gene ontology analysis showed that the hypermethylated regions were enriched in the genes related to neuron system and brain for both paranoid and undifferentiated SCZ (p < 0.05). Among these peaks, 121 peaks were located in gene promoter regions that might affect gene expression and influence the SCZ related pathways. Interestingly, DNA methylation of 136 and 23 known lncRNAs in Refseq database were identified in paranoid and undifferentiated SCZ, respectively. In addition, ∼20% of intergenic peaks annotated based on Refseq genes were overlapped with lncRNAs in UCSC and gencode databases. In order to show the results well for most biological researchers, we created an online database to display and visualize the information of DNA methyation peaks in both types of SCZ (http://www.bioinfo.org/scz/scz.htm). Our results showed that the aberrant DNA methylation of lncRNAs might be another important epigenetic factor for SCZ.

  9. DNA strand breaks induced by electrons simulated with Nanodosimetry Monte Carlo Simulation Code: NASIC.

    PubMed

    Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong

    2015-09-01

    The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish.

    PubMed

    Tan, Haihan; Onichtchouk, Daria; Winata, Cecilia

    2016-02-01

    The zebrafish has emerged as a model organism for genomics studies. The symposium "Toward an encyclopedia of DNA elements in zebrafish" held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE.

  11. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish

    PubMed Central

    2016-01-01

    Abstract The zebrafish has emerged as a model organism for genomics studies. The symposium “Toward an encyclopedia of DNA elements in zebrafish” held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE. PMID:26671609

  12. Particle classifier

    SciTech Connect

    Etkin, B.

    1987-04-14

    This patent describes a classifier for particulate material comprising a housing having an inlet to receive a classifying air flow flowing in a given direction, collection means downstream of the inlet to receive material classified by the air flow, and material introduction means intermediate the inlet and the collection means to introduce particles entrained in a secondary air stream into the housing in a direction other than the given direction. The material introduction means includes a material outlet aperture in a wall of the housing extending generally perpendicular to the given direction, conveying means to convey material and the secondary air stream to the material outlet and diverting means to divert the secondary air stream to a direction generally parallel to the classifying air flow flowing in the given direction. The diverting means includes a surface extending downstream from the outlet and adjacent thereto and being dimensioned to divert the secondary airstream by a Coanda effect generally parallel to the given direction and thereby segregate the secondary air/stream from the particles and permit continued movement of the particles along predictable trajectories.

  13. Classifying Microorganisms.

    ERIC Educational Resources Information Center

    Baker, William P.; Leyva, Kathryn J.; Lang, Michael; Goodmanis, Ben

    2002-01-01

    Focuses on an activity in which students sample air at school and generate ideas about how to classify the microorganisms they observe. The results are used to compare air quality among schools via the Internet. Supports the development of scientific inquiry and technology skills. (DDR)

  14. Classifying Microorganisms.

    ERIC Educational Resources Information Center

    Baker, William P.; Leyva, Kathryn J.; Lang, Michael; Goodmanis, Ben

    2002-01-01

    Focuses on an activity in which students sample air at school and generate ideas about how to classify the microorganisms they observe. The results are used to compare air quality among schools via the Internet. Supports the development of scientific inquiry and technology skills. (DDR)

  15. Quartz crystal microbalance detection of DNA single-base mutation based on monobase-coded cadmium tellurium nanoprobe.

    PubMed

    Zhang, Yuqin; Lin, Fanbo; Zhang, Youyu; Li, Haitao; Zeng, Yue; Tang, Hao; Yao, Shouzhuo

    2011-01-01

    A new method for the detection of point mutation in DNA based on the monobase-coded cadmium tellurium nanoprobes and the quartz crystal microbalance (QCM) technique was reported. A point mutation (single-base, adenine, thymine, cytosine, and guanine, namely, A, T, C and G, mutation in DNA strand, respectively) DNA QCM sensor was fabricated by immobilizing single-base mutation DNA modified magnetic beads onto the electrode surface with an external magnetic field near the electrode. The DNA-modified magnetic beads were obtained from the biotin-avidin affinity reaction of biotinylated DNA and streptavidin-functionalized core/shell Fe(3)O(4)/Au magnetic nanoparticles, followed by a DNA hybridization reaction. Single-base coded CdTe nanoprobes (A-CdTe, T-CdTe, C-CdTe and G-CdTe, respectively) were used as the detection probes. The mutation site in DNA was distinguished by detecting the decreases of the resonance frequency of the piezoelectric quartz crystal when the coded nanoprobe was added to the test system. This proposed detection strategy for point mutation in DNA is proved to be sensitive, simple, repeatable and low-cost, consequently, it has a great potential for single nucleotide polymorphism (SNP) detection.

  16. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    PubMed

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  17. Isolation of cDNA clones coding for the beta subunit of human beta-hexosaminidase.

    PubMed Central

    O'Dowd, B F; Quan, F; Willard, H F; Lamhonwah, A M; Korneluk, R G; Lowden, J A; Gravel, R A; Mahuran, D J

    1985-01-01

    The major forms of beta-hexosaminidase (2-acetamido-2-deoxy-beta-D-glucoside acetamidodeoxyglucohydrolase, EC 3.2.1.30) occur as multimers of alpha and beta chains--hexosaminidase A (alpha beta a beta b) and hexosaminidase B 2(beta a beta b). To facilitate the investigation of beta-chain biosynthesis and the nature of mutation in Sandhoff disease, a human hexosaminidase beta-chain cDNA clone was isolated. Hexosaminidase B (10 mg) was treated with CNBr, five peptide fragments were isolated by reverse-phase HPLC, and their amino acid sequences were determined. One of these contained a string of six amino acids from which an oligonucleotide probe was defined. The simian virus 40-transformed human fibroblast cDNA library of Okayama and Berg was screened by colony hybridization with the radiolabeled probe. Thirteen probe-binding clones were selected out of 50,000 clones screened. Four of these designated pHex were shown to be identical at their 3' ends by restriction enzyme mapping, differing only in their 5' extensions (1.4-1.7 kilobases). The nucleotide sequence of a 174-base-pair segment contained the deduced amino acid sequence of two of the five CNBr peptides, indicating that the pHex clones encode the beta subunit of hexosaminidase. In addition, pHex cDNA was found homologous to multiple bands in digests of genomic human DNA totaling 43 kilobases (kb), all of which were mapped to chromosome 5 in somatic cell hybrids, as expected of the HEXB gene. The pHex cDNA also hybridized to a 2.2-kilobase RNA that apparently codes for the pre-beta-polypeptide of hexosaminidase. This RNA species was absent in the fibroblasts of one of three patients with Sandhoff disease examined. We anticipate that these clones will be of value to diagnosis and carrier detection of Sandhoff disease in affected families. Images PMID:2579389

  18. Coding region SNP analysis to enhance dog mtDNA discrimination power in forensic casework.

    PubMed

    Verscheure, Sophie; Backeljau, Thierry; Desmyter, Stijn

    2015-01-01

    The high population frequencies of three control region haplotypes contribute to the low discrimination power of the dog mtDNA control region. It also diminishes the evidential power of a match with one of these haplotypes in forensic casework. A mitochondrial genome study of 214 Belgian dogs suggested 26 polymorphic coding region sites that successfully resolved dogs with the three most frequent control region haplotypes. In this study, three SNP assays were developed to determine the identity of the 26 informative sites. The control region of 132 newly sampled dogs was sequenced and added to the study of 214 dogs. The assays were applied to 58 dogs of the haplotypes of interest, which confirmed their suitability for enhancing dog mtDNA discrimination power. In the Belgian population study of 346 dogs, the set of 26 sites divided the dogs into 25 clusters of mtGenome sequences with substantially lower population frequency estimates than their control region sequences. In case of a match with one of the three control region haplotypes, using these three SNP assays in conjunction with control region sequencing would augment the exclusion probability of dog mtDNA analysis from 92.9% to 97.0%. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  19. Functional Intersection of ATM and DNA-Dependent Protein Kinase Catalytic Subunit in Coding End Joining during V(D)J Recombination

    PubMed Central

    Lee, Baeck-Seung; Gapud, Eric J.; Zhang, Shichuan; Dorsett, Yair; Bredemeyer, Andrea; George, Rosmy; Callen, Elsa; Daniel, Jeremy A.; Osipovich, Oleg; Oltz, Eugene M.; Bassing, Craig H.; Nussenzweig, Andre; Lees-Miller, Susan; Hammel, Michal; Chen, Benjamin P. C.

    2013-01-01

    V(D)J recombination is initiated by the RAG endonuclease, which introduces DNA double-strand breaks (DSBs) at the border between two recombining gene segments, generating two hairpin-sealed coding ends and two blunt signal ends. ATM and DNA-dependent protein kinase catalytic subunit (DNA-PKcs) are serine-threonine kinases that orchestrate the cellular responses to DNA DSBs. During V(D)J recombination, ATM and DNA-PKcs have unique functions in the repair of coding DNA ends. ATM deficiency leads to instability of postcleavage complexes and the loss of coding ends from these complexes. DNA-PKcs deficiency leads to a nearly complete block in coding join formation, as DNA-PKcs is required to activate Artemis, the endonuclease that opens hairpin-sealed coding ends. In contrast to loss of DNA-PKcs protein, here we show that inhibition of DNA-PKcs kinase activity has no effect on coding join formation when ATM is present and its kinase activity is intact. The ability of ATM to compensate for DNA-PKcs kinase activity depends on the integrity of three threonines in DNA-PKcs that are phosphorylation targets of ATM, suggesting that ATM can modulate DNA-PKcs activity through direct phosphorylation of DNA-PKcs. Mutation of these threonine residues to alanine (DNA-PKcs3A) renders DNA-PKcs dependent on its intrinsic kinase activity during coding end joining, at a step downstream of opening hairpin-sealed coding ends. Thus, DNA-PKcs has critical functions in coding end joining beyond promoting Artemis endonuclease activity, and these functions can be regulated redundantly by the kinase activity of either ATM or DNA-PKcs. PMID:23836881

  20. Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage.

    PubMed Central

    Choo, Y; Klug, A

    1994-01-01

    We have used two selection techniques to study sequence-specific DNA recognition by the zinc finger, a small, modular DNA-binding minidomain. We have chosen zinc fingers because they bind as independent modules and so can be linked together in a peptide designed to bind a predetermined DNA site. In this paper, we describe how a library of zinc fingers displayed on the surface of bacteriophage enables selection of fingers capable of binding to given DNA triplets. The amino acid sequences of selected fingers which bind the same triplet are compared to examine how sequence-specific DNA recognition occurs. Our results can be rationalized in terms of coded interactions between zinc fingers and DNA, involving base contacts from a few alpha-helical positions. In the paper following this one, we describe a complementary technique which confirms the identity of amino acids capable of DNA sequence discrimination from these positions. Images PMID:7972027

  1. Low mitochondrial DNA variation among American alligators and a novel non-coding region in crocodilians.

    PubMed

    Glenn, Travis C; Staton, Joseph L; Vu, Alex T; Davis, Lisa M; Bremer, Jaime R Alvarado; Rhodes, Walter E; Brisbin, I Lehr; Sawyer, Roger H

    2002-12-15

    We analyzed 1317-1823 base pairs (bp) of mitochondrial DNA sequence beginning in the 5' end of cytochrome b (cyt b) and ending in the central domain of the control region for 25 American alligators (Alligator mississippiensis) and compared these to a homologous sequence from a Chinese alligator (A. sinensis). Both species share a non-coding spacer between cyt b and tRNA(Thr). Chinese alligator cyt b differs from that of the American alligator by 17.5% at the nucleotide level and 13.8% for inferred amino acids, which is consistent with their presumed ancient divergence. Only two cyt b haplotypes were detected among the 25 American alligators (693-1199 bp surveyed), with one haplotype shared among 24 individuals. One alligator from Mississippi differed from all other alligators by a single silent substitution. The control region contained only slightly more variation among the 25 American alligators, with two variable positions (624 bp surveyed), yielding three haplotypes with 22, two, and one individuals in each of these groups. Previous genetic studies examining allozymes and the proportion of variable microsatellite DNA loci also found low levels of genetic diversity in American alligators. However, in contrast with allozymes, microsatellites, and morphology, the mtDNA data shows no evidence of differentiation among populations from the extremes of the species range. These results suggest that American alligators underwent a severe population bottleneck in the late Pleistocene, resulting in nearly homogenous mtDNA among all American alligators today. Copyright 2002 Wiley-Liss, Inc.

  2. A molecular genetic analysis of Eragrostis tef (Zucc.) Trotter: non-coding regions of chloroplast DNA, 18S rDNA and the transcription factor VP1.

    PubMed

    Espelund, M; Bekele, E; Holst-Jensen, A; Jakobsen, K S; Nordal, I

    2000-01-01

    The non-coding chloroplast DNA sequences of the trnL (UAA) intron and the trnL-trnF (GAA) intergeneric spacer (IGS), the coding sequences of nuclear 18S rDNA, and the transcription factor Vp1 of the cereal tef (Eragrostis tef (Zucc.) Trotter) were studied. No intraspecific variation was found among the 6 studied tef varieties. However, the study displayed that Eragrostis tef has a number of unique traits compared to other grasses. Phylogenetic analysis of the chloroplast DNA gave three grass clades, joining Eragrostis with sorghum and maize in one. In the analysis of the 18S rDNA sequences, the three grass species were joined in a monophyletic trichotomy in the cladogram, in which maize is the most divergent, rice the least and tef intermediate. The Vp1 is highly conserved. The Vp1 phylogeny showed that the tef Vp1-sequence is the hitherto most divergent Vp1-sequence reported from a grass.

  3. DENV gene of bacteriophage T4 codes for both pyrimidine dimer-DNA glycosylase and apyrimidinic endonuclease activities

    SciTech Connect

    McMillan, S.; Edenberg, H.J.; Radany, E.H.; Friedberg, R.C.; Friedberg, E.C.

    1981-10-01

    Recent studies have shown that purified preparations of phage T4 UV DNA-incising activity (T4 UV endonuclease or endonuclease V of phase T4) contain a pyrimidine dimer-DNA glycosylase activity that catalyzes hydrolysis of the 5' glycosyl bond of dimerized pyrimidines in UV-irradiated DNA. Such enzyme preparations have also been shown to catalyze the hydrolysis of phosphodiester bonds in UV-irradiated DNA at a neutral pH, presumably reflecting the action of an apurinic/apyrimidinic endonuclease at the apyrimidinic sites created by the pyrimidine dimer-DNA glycosylase. In this study we found that preparations of T4 UV DNA-incising activity contained apurinic/apyrimidinic endonuclease activity that nicked depurinated form I simian virus 40 DNA. Apurinic/apyrimidinic endonuclease activity was also found in extracts of Escherichia coli infected with T4 denV/sup +/ phage. Extracts of cells infected with T4 denV mutants contained significantly lower levels of apurinic/apyrimidinic endonuclease activity; these levels were no greater than the levels present in extracts of uninfected cells. Furthermore, the addition of DNA containing UV-irradiated DNA and T4 enzyme resulted in competition for pyrimidine dimer-DNA glycosylase activity against the UV-irradiated DNA. On the basis of these results, we concluded that apurinic/apyrimidinic endonuclease activity is encoded by the denV gene of phage T4, the same gene that codes for pyrimidine dimer-DNA glycosylase activity.

  4. Comparison of Geant4-DNA simulation of S-values with other Monte Carlo codes

    NASA Astrophysics Data System (ADS)

    André, T.; Morini, F.; Karamitros, M.; Delorme, R.; Le Loirec, C.; Campos, L.; Champion, C.; Groetz, J.-E.; Fromm, M.; Bordage, M.-C.; Perrot, Y.; Barberet, Ph.; Bernal, M. A.; Brown, J. M. C.; Deleuze, M. S.; Francis, Z.; Ivanchenko, V.; Mascialino, B.; Zacharatou, C.; Bardiès, M.; Incerti, S.

    2014-01-01

    Monte Carlo simulations of S-values have been carried out with the Geant4-DNA extension of the Geant4 toolkit. The S-values have been simulated for monoenergetic electrons with energies ranging from 0.1 keV up to 20 keV, in liquid water spheres (for four radii, chosen between 10 nm and 1 μm), and for electrons emitted by five isotopes of iodine (131, 132, 133, 134 and 135), in liquid water spheres of varying radius (from 15 μm up to 250 μm). The results have been compared to those obtained from other Monte Carlo codes and from other published data. The use of the Kolmogorov-Smirnov test has allowed confirming the statistical compatibility of all simulation results.

  5. Temporal and spatial trends in prey composition of wahoo Acanthocybium solandri: a diet analysis from the central North Pacific Ocean using visual and DNA bar-coding techniques.

    PubMed

    Oyafuso, Z S; Toonen, R J; Franklin, E C

    2016-04-01

    A diet analysis was conducted on 444 wahoo Acanthocybium solandri caught in the central North Pacific Ocean longline fishery and a nearshore troll fishery surrounding the Hawaiian Islands from June to December 2014. In addition to traditional observational methods of stomach contents, a DNA bar-coding approach was integrated into the analysis by sequencing the cytochrome c oxidase subunit 1 (COI) region of the mtDNA genome to taxonomically identify individual prey items that could not be classified visually to species. For nearshore-caught A. solandri, juvenile pre-settlement reef fish species from various families dominated the prey composition during the summer months, followed primarily by Carangidae in autumn months. Gempylidae, Echeneidae and Scombridae were dominant prey taxa from the offshore fishery. Molidae was a common prey family found in stomachs collected north-east of the Hawaiian Archipelago while tetraodontiform reef fishes, known to have extended pelagic stages, were prominent prey items south-west of the Hawaiian Islands. The diet composition of A. solandri was indicative of an adaptive feeder and thus revealed dominant geographic and seasonal abundances of certain taxa from various ecosystems in the marine environment. The addition of molecular bar-coding to the traditional visual method of prey identifications allowed for a more comprehensive range of the prey field of A. solandri to be identified and should be used as a standard component in future diet studies.

  6. Molecular cloning of the cDNA coding for the (R)-(+)-mandelonitrile lyase of Prunus amygdalus: temporal and spatial expression patterns in flowers and mature seeds.

    PubMed

    Suelves, M; Puigdomènech, P

    1998-10-01

    A gene highly expressed in the floral organs of almond (Prunus amygdalus Batsch), and coding for the cyanogenic enzyme (R)-(+)-mandelonitrile lyase (EC 4.1.2.10), has been identified and the full-length cDNA sequenced. The temporal expression pattern in maturing seeds and during floral development was analyzed by RNA blot, and the highest mRNA levels were detected in floral tissues. The spatial mRNA accumulation pattern in almond flower buds was also analyzed by in-situ hybridization. The mRNA levels were compared during seed maturation and floral development in fruit and floral samples from cultivars classified as homozygous or heterozygous for the sweet-almond trait or homozygous for the bitter trait. No correlation was found between these characteristics and levels of mandelonitrile lyase mRNA, suggesting that the presence of this protein is not the limiting factor in the production of hydrogen cyanide.

  7. Humans and chimpanzees differ in their cellular response to DNA damage and non-coding sequence elements of DNA repair-associated genes.

    PubMed

    Weis, E; Galetzka, D; Herlyn, H; Schneider, E; Haaf, T

    2008-01-01

    Compared to humans, chimpanzees appear to be less susceptible to many types of cancer. Because DNA repair defects lead to accumulation of gene and chromosomal mutations, species differences in DNA repair are one plausible explanation. Here we analyzed the repair kinetics of human and chimpanzee cells after cisplatin treatment and irradiation. Dot blots for the quantification of single-stranded (ss) DNA repair intermediates revealed a biphasic response of human and chimpanzee lymphoblasts to cisplatin-induced damage. The early phase of DNA repair was identical in both species with a peak of ssDNA intermediates at 1 h after DNA damage induction. However, the late phase differed between species. Human cells showed a second peak of ssDNA intermediates at 6 h, chimpanzee cells at 5 h. One of four analyzed DNA repair-associated genes, UBE2A, was differentially expressed in human and chimpanzee cells at 5 h after cisplatin treatment. Immunofluorescent staining of gammaH2AX foci demonstrated equally high numbers of DNA strand breaks in human and chimpanzee cells at 30 min after irradiation and equally low numbers at 2 h. However, at 1 h chimpanzee cells had significantly less DNA breaks than human cells. Comparative sequence analyses of approximately 100 DNA repair-associated genes in human and chimpanzee revealed 13% and 32% genes, respectively, with evidence for an accelerated evolution in promoter regions and introns. This is strikingly contrasting to the 3% of DNA repair-associated genes with positive selection in the coding sequence. Compared to the rhesus macaque as an outgroup, chimpanzees have a higher accelerated evolution in non-coding sequences than humans. The TRF1-interacting, ankyrin-related ADP-ribose polymerase (TNKS) gene showed an accelerated intraspecific evolution among humans. Our results are consistent with the view that chimpanzee cells repair different types of DNA damage faster than human cells, whereas the overall repair capacity is similar in

  8. cDNA sequence of human transforming gene hst and identification of the coding sequence required for transforming activity

    SciTech Connect

    Taira, M.; Yoshida, T.; Miyagawa, K.; Sakamoto, H.; Terada, M.; Sugimura, T.

    1987-05-01

    The hst gene was originally identified as a transforming gene in DNAs from human stomach cancers and from a noncancerous portion of stomach mucosa by DNA-mediated transfection assay using NIH3T3 cells. cDNA clones of hst were isolated from the cDNA library constructed from poly(A)/sup +/ RNA of a secondary transformant induced by the DNA from a stomach cancer. The sequence analysis of the hst cDNA revealed the presence of two open reading frames. When this cDNA was inserted into an expression vector containing the simian virus 40 promoter, it efficiently induced the transformation of NIH3T3 cells upon transfection. It was found that one of the reading frames, which coded for 206 amino acids, was responsible for the transforming activity.

  9. Mechanism of ultraviolet-induced mutagenesis: the coding properties of ultraviolet-irradiated poly(dC) replicated by E. coli DNA polymerase I.

    PubMed Central

    Lecomte, P; Boiteux, S; Doubleday, O

    1981-01-01

    We have identified three lesions rather than cyclobutane dimers which alter the properties of UV-irradiated poly(dC) as a template for E.coli DNA polymerase I, and have characterised these lesions with respect to their coding properties, rates of formation and decay, and their sensitivity to uracil DNA glycosylase. Our results lead us to conclude that these lesions are (1) cytosine hydrates, which code for cytosine and to a lesser extent thymine, (2) uracil hydrates, which code for adenine and are not sensitive to uracil DNA glycosylase, and (3) uracils, which code for adenine and are removed by uracil DNA glycosylase. PMID:7024915

  10. Detection of coding microsatellite frameshift mutations in DNA mismatch repair-deficient mouse intestinal tumors.

    PubMed

    Woerner, Stefan M; Tosti, Elena; Yuan, Yan P; Kloor, Matthias; Bork, Peer; Edelmann, Winfried; Gebert, Johannes

    2015-11-01

    Different DNA mismatch repair (MMR)-deficient mouse strains have been developed as models for the inherited cancer predisposing Lynch syndrome. It is completely unresolved, whether coding mononucleotide repeat (cMNR) gene mutations in these mice can contribute to intestinal tumorigenesis and whether MMR-deficient mice are a suitable molecular model of human microsatellite instability (MSI)-associated intestinal tumorigenesis. A proof-of-principle study was performed to identify mouse cMNR-harboring genes affected by insertion/deletion mutations in MSI murine intestinal tumors. Bioinformatic algorithms were developed to establish a database of mouse cMNR-harboring genes. A panel of five mouse noncoding mononucleotide markers was used for MSI classification of intestinal matched normal/tumor tissues from MMR-deficient (Mlh1(-/-) , Msh2(-/-) , Msh2(LoxP/LoxP) ) mice. cMNR frameshift mutations of candidate genes were determined by DNA fragment analysis. Murine MSI intestinal tumors but not normal tissues from MMR-deficient mice showed cMNR frameshift mutations in six candidate genes (Elavl3, Tmem107, Glis2, Sdccag1, Senp6, Rfc3). cMNRs of mouse Rfc3 and Elavl3 are conserved in type and length in their human orthologs that are known to be mutated in human MSI colorectal, endometrial and gastric cancer. We provide evidence for the utility of a mononucleotide marker panel for detection of MSI in murine tumors, the existence of cMNR instability in MSI murine tumors, the utility of mouse subspecies DNA for identification of polymorphic repeats, and repeat conservation among some orthologous human/mouse genes, two of them showing instability in human and mouse MSI intestinal tumors. MMR-deficient mice hence are a useful molecular model system for analyzing MSI intestinal carcinogenesis.

  11. Joining mutants of RAG1 and RAG2 that demonstrate impaired interactions with the coding-end DNA.

    PubMed

    Nagawa, Fumikiyo; Hirose, Satoshi; Nishizumi, Hirofumi; Nishihara, Tadashi; Sakano, Hitoshi

    2004-09-10

    In V(D)J joining of antigen receptor genes, two recombination signal sequences (RSSs), 12- and 23-RSSs, form a complex with the protein products of recombination activating genes, RAG1 and RAG2. DNaseI footprinting demonstrates that the interaction of RAG proteins with substrate RSS DNA is not just limited to the signal region but involves the coding sequence as well. Joining mutants of RAG1 and RAG2 demonstrate impaired interactions with the coding region in both pre- and postcleavage type complexes. A possible role of this RAG coding region interaction is discussed in the context of V(D)J recombination.

  12. Utility of a combined current procedural terminology and International Classification of Diseases, Ninth Revision, Clinical Modification code algorithm in classifying cervical spine surgery for degenerative changes.

    PubMed

    Wang, Marjorie C; Laud, Purushottam W; Macias, Melissa; Nattinger, Ann B

    2011-10-15

    Retrospective study. To evaluate the sensitivity and specificity of a combined Current Procedural Terminology (CPT) and International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) algorithm in defining cervical spine surgery in comparison to patient operative reports in the medical record. Epidemiological studies of spine surgery often use ICD-9-CM billing codes in administrative databases to study trends and outcome of surgery. However, ICD-9-CM codes do not clearly identify specific surgical factors that may be related to outcome, such as instrumentation or number of levels treated. Previous studies have not investigated the sensitivity and specificity of a combined CPT and ICD-9-CM code algorithm for defining cervical spine surgical procedures. We performed a retrospective study comparing the sensitivity and specificity of a combined CPT and ICD-9-CM code algorithm to the operative note, the gold standard, in a single academic center. We also compared the accuracy of our combined algorithm with our published ICD-9-CM-only algorithm. The combined algorithm has high sensitivity and specificity for defining cervical spine surgery, specific surgical procedures such as discectomy and fusion, and surgical approach. Compared to the ICD-9-CM-only algorithm, the combined algorithm significantly improves identification of discectomy, laminectomy, and fusion procedures and allows identification of specific procedures such as laminaplasty and instrumentation with high sensitivity and specificity. Identification of reoperations has low sensitivity and specificity, but identification of number of levels instrumented, fused, and decompressed has high specificity. The use of our combined CPT and ICD-9-CM algorithm to identify cervical spine surgery was highly sensitive and specific. For categories such as surgical approach, accuracy of our combined algorithm was similar to that of our ICD-9-CM-only algorithm. However, the combined algorithm

  13. Analysis of cDNA coding MHC class II beta chain of the chimpanzee (Pan troglodytes).

    PubMed

    Hatta, Yuki; Kanai, Tomoko; Matsumoto, Yoshitsugu; Kyuwa, Shigeru; Hayasaka, Ikuo; Yoshikawa, Yasuhiro

    2002-04-01

    The chimpanzee (Pan troglodytes, Patr) is the closest zoological living relative of humans and shares approximately 98.6% genetic homology to human beings. Although major histocompatibility complex (MHC) plays a critical role in T cell-mediated immune responses in vertebrates, the information on Patr MHC remains at a relatively poor level. Therefore, we attempted to isolate Patr MHC class II genes and determine their nucleotide sequences. The cDNAs encoding Patr MHC class II DP, DQ and DR beta chains were isolated from the cDNA library of a chimpanzee B lymphocyte cell line Bch261. As a result of screening, the clone 6-3-1 as a representative of Patr DP clone, clone 30-1 as a Patr DQ clone, and clones 4-7-1 and 55-1 having different sequences as Patr DR clones were detected. The clone 6-3-1 consisted of 1,062 nucleotides including an open reading frame (ORF) of 777 bp. In the same way, clone 30-1 consisted of 1,172 nucleotides including ORF of 786 bp, clones 4-7-1 and 55-1 consisted of 1,163 nucleotides including ORF of 801 bp. Except for five nucleotide changes, clones 4-7-1 and 55-1 were the same sequence. By comparison with the nucleotide sequences already reported on chimpanzee MHC class II beta 1 genes, clones 6-3-1, 30-1, 4-7-1 and 55-1 were classified as PatrDPB1*16, PatrDQB1*0302, PatrDRB1*0201 and PatrDRB1*0204, respectively. This is the first report to describe complete cDNA sequences of Patr DP and DQ molecules. The nucleotide sequence data of Patr MHC class II genes obtained in this study will be useful for the genotyping of Patr MHC class II genes in individual chimpanzees.

  14. MALDI-TOF MS analysis of ribosomal proteins coded in S10 and spc operons rapidly classified the Sphingomonadaceae as alkylphenol polyethoxylate-degrading bacteria from the environment.

    PubMed

    Hotta, Yudai; Sato, Hiroaki; Hosoda, Akifumi; Tamura, Hiroto

    2012-05-01

    Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) using ribosomal subunit proteins coded in the S10-spc-alpha operon as biomarkers was applied for the classification of the Sphingomonadaceae from the environment. To construct a ribosomal protein database, S10-spc-alpha operon of type strains of the Sphingomonadaceae and their related alkylphenol polyethoxylate (APEO(n) )-degrading bacteria were sequenced using specific primers designed based on nucleotide sequences of genome-sequenced strains. The observed MALDI mass spectra of intact cells were compared with the theoretical mass of the constructed ribosomal protein database. The nine selected biomarkers coded in the S10-spc-alpha operon, L18, L22, L24, L29, L30, S08, S14, S17, and S19, could successfully distinguish the Sphingopyxis terrae NBRC 15098(T) and APEO(n) -degrading bacteria strain BSN20, despite only one base difference in the 16S rRNA gene sequence. This method, named the S10-GERMS (S10-spc-alpha operon gene-encoded ribosomal protein mass spectrum) method, is a significantly useful tool for bacterial discrimination of the Sphingomonadaceae at the strain level and can detect and monitor the main APEO(n) -degrading bacteria in the environment. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  15. The dnaN gene codes for the beta subunit of DNA polymerase III holoenzyme of escherichia coli.

    PubMed

    Burgers, P M; Kornberg, A; Sakakibara, Y

    1981-09-01

    An Escherichia coli mutant, dnaN59, stops DNA synthesis promptly upon a shift to a high temperature; the wild-type dnaN gene carried in a transducing phage encodes a polypeptide of about 41,000 daltons [Sakakibara, Y. & Mizukami, T. (1980) Mol. Gen. Genet. 178, 541-553; Yuasa, S. & Sakakibara, Y. (1980) Mol. Gen. Genet. 180, 267-273]. We now find that the product of dnaN gene is the beta subunit of DNA polymerase III holoenzyme, the principal DNA synthetic multipolypeptide complex in E. coli. The conclusion is based on the following observations: (i) Extracts from dnaN59 cells were defective in phage phi X174 and G4 DNA synthesis after the mutant cells had been exposed to the increased temperature. (ii) The enzymatic defect was overcome by addition of purified beta subunit but not by other subunits of DNA polymerase III holoenzyme or by other replication proteins required for phi X174 DNA synthesis. (iii) Partially purified beta subunit from the dnaN mutant, unlike that from the wild type, was inactive in reconstituting the holoenzyme when mixed with the other purified subunits. (iv) Increased dosage of the dnaN gene provided by a plasmid carrying the gene raised cellular levels of the beta subunit 5- to 6-fold.

  16. The dnaN gene codes for the beta subunit of DNA polymerase III holoenzyme of escherichia coli.

    PubMed Central

    Burgers, P M; Kornberg, A; Sakakibara, Y

    1981-01-01

    An Escherichia coli mutant, dnaN59, stops DNA synthesis promptly upon a shift to a high temperature; the wild-type dnaN gene carried in a transducing phage encodes a polypeptide of about 41,000 daltons [Sakakibara, Y. & Mizukami, T. (1980) Mol. Gen. Genet. 178, 541-553; Yuasa, S. & Sakakibara, Y. (1980) Mol. Gen. Genet. 180, 267-273]. We now find that the product of dnaN gene is the beta subunit of DNA polymerase III holoenzyme, the principal DNA synthetic multipolypeptide complex in E. coli. The conclusion is based on the following observations: (i) Extracts from dnaN59 cells were defective in phage phi X174 and G4 DNA synthesis after the mutant cells had been exposed to the increased temperature. (ii) The enzymatic defect was overcome by addition of purified beta subunit but not by other subunits of DNA polymerase III holoenzyme or by other replication proteins required for phi X174 DNA synthesis. (iii) Partially purified beta subunit from the dnaN mutant, unlike that from the wild type, was inactive in reconstituting the holoenzyme when mixed with the other purified subunits. (iv) Increased dosage of the dnaN gene provided by a plasmid carrying the gene raised cellular levels of the beta subunit 5- to 6-fold. PMID:6458041

  17. Widespread selection across coding and noncoding DNA in the pea aphid genome.

    PubMed

    Bickel, Ryan D; Dunham, Joseph P; Brisson, Jennifer A

    2013-06-21

    Genome-wide patterns of diversity and selection are critical measures for understanding how evolution has shaped the genome. Yet, these population genomic estimates are available for only a limited number of model organisms. Here we focus on the population genomics of the pea aphid (Acyrthosiphon pisum). The pea aphid is an emerging model system that exhibits a range of intriguing biological traits not present in classic model systems. We performed low-coverage genome resequencing of 21 clonal pea aphid lines collected from alfalfa host plants in North America to characterize genome-wide patterns of diversity and selection. We observed an excess of low-frequency polymorphisms throughout coding and noncoding DNA, which we suggest is the result of a founding event and subsequent population expansion in North America. Most gene regions showed lower levels of Tajima's D than synonymous sites, suggesting that the majority of the genome is not evolving neutrally but rather exhibits significant constraint. Furthermore, we used the pea aphid's unique manner of X-chromosome inheritance to assign genomic scaffolds to either autosomes or the X chromosome. Comparing autosomal vs. X-linked sequence variation, we discovered that autosomal genes show an excess of low frequency variants indicating that purifying selection acts more efficiently on the X chromosome. Overall, our results provide a critical first step in characterizing the genetic diversity and evolutionary pressures on an aphid genome.

  18. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome

    PubMed Central

    Beh, Leslie Y.; Müller, Manuel M.; Muir, Tom W.; Kaplan, Noam; Landweber, Laura F.

    2015-01-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these “seed” nucleosomes—together with trans-acting factors—may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences. PMID:26330564

  19. Large-scale motif discovery using DNA Gray code and equiprobable oligomers

    PubMed Central

    Ichinose, Natsuhiro; Yada, Tetsushi; Gotoh, Osamu

    2012-01-01

    Motivation: How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. Results: We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. Availability: The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ Contact: ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp PMID:22057160

  20. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome.

    PubMed

    Beh, Leslie Y; Müller, Manuel M; Muir, Tom W; Kaplan, Noam; Landweber, Laura F

    2015-11-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these "seed" nucleosomes--together with trans-acting factors--may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences.

  1. Large-scale motif discovery using DNA Gray code and equiprobable oligomers.

    PubMed

    Ichinose, Natsuhiro; Yada, Tetsushi; Gotoh, Osamu

    2012-01-01

    How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp

  2. DNA sequence-based "bar codes" for tracking the origins of expressed sequence tags from a maize cDNA library constructed using multiple mRNA sources.

    PubMed

    Qiu, Fang; Guo, Ling; Wen, Tsui-Jung; Liu, Feng; Ashlock, Daniel A; Schnable, Patrick S

    2003-10-01

    To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs. Although this approach allows the origins of ESTs to be determined, it requires the production of multiple libraries. A hybrid approach is reported here. A cDNA library was prepared using 21 different pools of maize (Zea mays) mRNAs. DNA sequence "bar codes" were added during first-strand cDNA synthesis to uniquely identify the mRNA source pool from which individual cDNAs were derived. Using a decoding algorithm that included error correction, it was possible to identify the source mRNA pool of more than 97% of the ESTs. The frequency at which a bar code is represented in an EST contig should be proportional to the abundance of the corresponding mRNA in the source pool. Consistent with this, all ESTs derived from several genes (zein and adh1) that are known to be exclusively expressed in kernels or preferentially expressed under anaerobic conditions, respectively, were exclusively tagged with bar codes associated with mRNA pools prepared from kernel and anaerobically treated seedlings, respectively. Hence, by allowing for the retention of expression data, the bar coding of cDNA libraries can enhance the value of EST projects.

  3. URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit

    NASA Astrophysics Data System (ADS)

    Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe

    1986-10-01

    The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.

  4. Functional expression in primate cells of cloned DNA coding for the hemagglutinin surface glycoprotein of influenza virus.

    PubMed Central

    Sveda, M M; Lai, C J

    1981-01-01

    We have used simian virus 40 (SV40) DNA as a vector for expression of functional activity of a cloned influenza viral DNA segment in primate cells. Cloned full-length DNA sequences coding for the hemagglutinin of influenza A virus (Udorn/72/[H3N2]) were inserted into the late region of a viable deletion mutant of SV40, and the hybrid DNA was propagated in the presence of an early SV40 mutant (tsA28) helper. Infection of primate cells with the hybrid virus produced a polypeptide similar in molecular size to the hemagglutinin of influenza virus, as shown by immunoprecipitation and gel electrophoresis. The polypeptide was glycosylated, as shown by incorporation of radioactive sugars. The putative hemagglutinin exhibited functional activity, as shown by agglutination of erythrocytes. In addition, an indirect immunofluorescence assay showed that the hemagglutinin polypeptide of the hybrid virus could be detected on the surface of infected cells. Images PMID:6272305

  5. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  6. Classifying Facial Actions

    PubMed Central

    Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.

    2010-01-01

    The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284

  7. Natural selection on coding and noncoding DNA sequences is associated with virulence genes in a plant pathogenic fungus.

    PubMed

    Rech, Gabriel E; Sanz-Martín, José M; Anisimova, Maria; Sukno, Serenella A; Thon, Michael R

    2014-09-04

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.

  8. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  9. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    PubMed

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction.

  10. The molecular cloning and characterisation of cDNA coding for the alpha subunit of the acetylcholine receptor.

    PubMed Central

    Sumikawa, K; Houghton, M; Smith, J C; Bell, L; Richards, B M; Barnard, E A

    1982-01-01

    A rare cDNA coding for most of the alpha subunit of the Torpedo nicotinic acetylcholine receptor has been cloned into bacteria. The use of a mismatched oligonucleotide primer of reverse transcriptase facilitated the design of an efficient, specific probe for recombinant bacteria. DNA sequence analysis has enabled the elucidation of a large part of the polypeptide primary sequence which is discussed in relation to its acetylcholine binding activity and the location of receptor within the plasma membrane. When used as a radioactive probe, the cloned cDNA binds specifically to a single Torpedo mRNA species of about 2350 nucleotides in length but fails to show significant cross-hybridisation with alpha subunit mRNA extracted from cat muscle. Images PMID:6183641

  11. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

    PubMed

    Kawano, Tomonori

    2013-03-01

    There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.

  12. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    PubMed Central

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-01-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity. PMID:26472689

  13. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    NASA Astrophysics Data System (ADS)

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  14. Breaking the code of DNA binding specificity of TAL-type III effectors.

    PubMed

    Boch, Jens; Scholze, Heidi; Schornack, Sebastian; Landgraf, Angelika; Hahn, Simone; Kay, Sabine; Lahaye, Thomas; Nickstadt, Anja; Bonas, Ulla

    2009-12-11

    The pathogenicity of many bacteria depends on the injection of effector proteins via type III secretion into eukaryotic cells in order to manipulate cellular processes. TAL (transcription activator-like) effectors from plant pathogenic Xanthomonas are important virulence factors that act as transcriptional activators in the plant cell nucleus, where they directly bind to DNA via a central domain of tandem repeats. Here, we show how target DNA specificity of TAL effectors is encoded. Two hypervariable amino acid residues in each repeat recognize one base pair in the target DNA. Recognition sequences of TAL effectors were predicted and experimentally confirmed. The modular protein architecture enabled the construction of artificial effectors with new specificities. Our study describes the functionality of a distinct type of DNA binding domain and allows the design of DNA binding domains for biotechnology.

  15. Muscle coding sequences and their regulation during myogenesis: cloning of muscle actin cDNA probes.

    PubMed

    Minty, A; Caravatti, M; Robert, B; Cohen, A; Daubas, P; Weydert, A; Gros, F; Buckingham, M

    1981-01-01

    For a number of years our group has been mainly interested in the regulation of muscle gene expression during myogenesis. Using primary cultures and cell lines we have tried to find out whether the coding sequences for muscle proteins are already present in an unexpressed form or if there is a transcriptional switch at the onset of differentiation. Metabolic studies on pulse-labelled RNA, together with translation and molecular hybridization experiments have given a certain number of indications. More recently the development of genetic engineering techniques has made it possible to answer these questions directly with probes which are complementary to specific muscle coding sequences. We have identified a plasmid which contains a coding sequence for muscle actin. Other recombinant plasmids are being characterized. Such plasmids, used as probes, will permit us to study the organization and expression of the genes coding for the contractile proteins in muscle cells.

  16. Deciphering the Epigenetic Code: An Overview of DNA Methylation Analysis Methods

    PubMed Central

    Umer, Muhammad

    2013-01-01

    Abstract Significance: Methylation of cytosine in DNA is linked with gene regulation, and this has profound implications in development, normal biology, and disease conditions in many eukaryotic organisms. A wide range of methods and approaches exist for its identification, quantification, and mapping within the genome. While the earliest approaches were nonspecific and were at best useful for quantification of total methylated cytosines in the chunk of DNA, this field has seen considerable progress and development over the past decades. Recent Advances: Methods for DNA methylation analysis differ in their coverage and sensitivity, and the method of choice depends on the intended application and desired level of information. Potential results include global methyl cytosine content, degree of methylation at specific loci, or genome-wide methylation maps. Introduction of more advanced approaches to DNA methylation analysis, such as microarray platforms and massively parallel sequencing, has brought us closer to unveiling the whole methylome. Critical Issues: Sensitive quantification of DNA methylation from degraded and minute quantities of DNA and high-throughput DNA methylation mapping of single cells still remain a challenge. Future Directions: Developments in DNA sequencing technologies as well as the methods for identification and mapping of 5-hydroxymethylcytosine are expected to augment our current understanding of epigenomics. Here we present an overview of methodologies available for DNA methylation analysis with special focus on recent developments in genome-wide and high-throughput methods. While the application focus relates to cancer research, the methods are equally relevant to broader issues of epigenetics and redox science in this special forum. Antioxid. Redox Signal. 18, 1972–1986. PMID:23121567

  17. The vicilin gene family of pea (Pisum sativum L.): a complete cDNA coding sequence for preprovicilin.

    PubMed Central

    Lycett, G W; Delauney, A J; Gatehouse, J A; Gilroy, J; Croy, R R; Boulter, D

    1983-01-01

    A cDNA plasmid bank has been constructed using mRNA from developing pea seeds and three cDNAs coding for vicilin polypeptides have been selected. These cDNAs have been sequenced and between them cover the whole of the coding sequence plus part of the 5' and 3' untranslated regions. Comparison with amino acid sequence data from the protein indicates that vicilin is synthesised as preprovicilin with subsequent removal of a signal peptide and a C-terminal peptide as well as post translational endo-proteolytic cleavage. The cDNAs represent two different classes of vicilin genes whilst amino acid data show that there are at least three major classes of vicilin polypeptide. The vicilin sequences show extensive homology with conglycinin and phaseolin except in the regions of the internal proteolytic cleavages. The evolutionary significance of this relationship is discussed. Images PMID:6687941

  18. A framework for the DNA-protein recognition code of the probe helix in transcription factors: the chemical and stereochemical rules.

    PubMed

    Suzuki, M

    1994-04-15

    Understanding the general mechanisms of sequence specific DNA recognition by proteins is a major challenge in structural biology. The existence of a 'DNA recognition code' for proteins, by which certain amino acid residues on a protein surface confer specificity for certain DNA bases, has been the subject of much discussion. However, no simple code has yet been established. The principles of DNA recognition can be described at two levels. The 'chemical' rules describe the partnerships between amino acid side chains and DNA bases making favourable interactions in the major groove of DNA. Here I analyze the occurrence of nucleotide-amino acid contacts in previously determined crystal structures of DNA-protein complexes and find that simple rules pertain. I also describe 'stereochemical' rules for the probe helix type of DNA-binding motif found in certain transcription factors including leucine zipper and homeodomain proteins. These are a consequence of the binding geometry, and specify the amino acid and base positions used for the contacts, and the sizes of residues in the contact interface. The chemical rules can be generalized for any DNA-binding motif, while the stereochemical rules are specific to a particular DNA-binding motif. The recognition code for a particular binding motif can be described by combining the two sets of rules.

  19. Tumor regression induced by intratumoral injection of DNA coding for human interleukin 12 into melanoma metastases in gray horses.

    PubMed

    Heinzerling, L M; Feige, K; Rieder, S; Akens, M K; Dummer, R; Stranzinger, G; Moelling, K

    2001-01-01

    Preclinical studies investigating new therapeutic principles against melanoma are presently being carried out in mouse models; however, these are not optimal. Here we describe a novel animal model using gray horses. These animals spontaneously develop metastatic melanoma that resembles human disease and is thus highly relevant for preclinical studies testing new immunotherapy protocols. We found that injection of plasmid DNA coding for the human cytokine interleukin 12 into established metastases induced significant regression in all 12 treated lesions in a total of 7 horses. Complete disappearance was observed in one treated lesion, with no recurrence after 6 months. No adverse events have been observed in any of the animals during and after treatment. These results demonstrate the effectiveness and safety of interleukin 12 encoding plasmid DNA therapy against established metastatic disease in a large animal model and serve as a basis for a clinical trial.

  20. RGB colour coding of Y-shaped DNA for simultaneous tri-analyte solid phase hybridization detection.

    PubMed

    Krissanaprasit, Abhichart; Somasundrum, Mithran; Surareungchai, Werasak

    2011-01-15

    We present a new concept for tri-analyte DNA detection based on the idea of a Y-shaped capture probe which, after tri-target and fluorescently labeled reporter probe binding, becomes colour-coded to generate images in an RGB colour scheme. Hence, the RGB value of the resulting secondary pseudo-colour presented by the hybridized Y-DNA can be related to the ratio of the primary pseudo-colours present in its make-up, and thus to the ratio of the three target concentrations. As a proof of concept we detect sequences from the genes of the pathogenic bacterial strains Escherichia coli O157:H7, Vibrio cholera and Salmonella enteric in a semi-quantitative manner across the range 20-167 nM. The assay was relatively quick, with a time from hybridization to completed data interpretation of approximately 4 h. Copyright © 2010 Elsevier B.V. All rights reserved.

  1. Bio-bar-code dendrimer-like DNA as signal amplifier for cancerous cells assay using ruthenium nanoparticle-based ultrasensitive chemiluminescence detection.

    PubMed

    Bi, Sai; Hao, Shuangyuan; Li, Li; Zhang, Shusheng

    2010-09-07

    Bio-bar-code dendrimer-like DNA (bbc-DL-DNA) is employed as a label for the amplification assay of cancer cells in combination with the newly explored chemiluminescence (CL) system of luminol-H(2)O(2)-Ru(3+) and specificity of structure-switching aptamers selected by cell-based SELEX.

  2. Molecular cloning of a gene (poIA) coding for an unusual DNA polymerase I from Treponema pallidum.

    PubMed

    Rodes, B; Liu, H; Johnson, S; George, R; Steiner, B

    2000-07-01

    The gene coding for the DNA polymerase I from Treponema pallidum, Nichols strain, was cloned and sequenced. Depending on which of the two alternative initiation codons was used, the protein was either 997 or 1015 amino acids long and the predicted protein had a molecular mass of either 112 or 114 kDa. Sequence comparisons with other polA genes showed that all three domains expected in the DNA polymerase I class of enzymes were present in the protein (5'-3' exonuclease, 3'-5' exonuclease and polymerase domains). Additionally, there were four unique insertions of 20-30 amino acids each, not seen in other DNA polymerase I enzymes. Two of the inserts were near the boundary of the two exonuclease domains and the other two interrupted the 3'-5' exonuclease domain which is involved in proofreading. The predicted amino-acid sequence had an exceptionally high content of cysteine (2.4% compared with <0.05% for most other sequenced DNA polymerase I enzymes). The polA gene was further cloned into pProEXHTa for expression and purification. The transformants expressed a protein of 115 kDa. Antibodies raised against synthetic peptide fragments of the putative DNA polymerase I recognised the 115-kda band in Western blot analysis. No DNA synthesis activity could be demonstrated on a primed single-stranded template. Although significant quantities of the protein were produced in the host Escherichia coli carrying the plasmid, it was not capable of complementing a polA(-) mutant in the replication of a polA-dependent plasmid.

  3. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation.

    PubMed

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-10-30

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis.

  4. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation

    PubMed Central

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-01-01

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis. PMID:24097061

  5. Conservation of genetic information: a code for site-specific DNA recognition.

    PubMed Central

    Harris, L F; Sullivan, M R; Hickok, D F

    1993-01-01

    We present findings of genetic information conservation between the glucocorticoid response element (GRE) DNA and the cDNA encoding the glucocorticoid receptor (GR) DNA-binding domain (DBD). The regions of nucleotide sub-sequence similarity to the GRE in the GR DBD occur specifically at nucleotide sequences on the ends of exons 3,4, and 5 at their splice junction sites. These sequences encode the DNA recognition helix on exon 3, a beta-strand on exon 4, and a putative alpha-helix on exon 5, respectively. The nucleotide sequence of exon 5 that encodes the putative alpha-helix located on the carboxyl terminus of the GR DBD shares sequence similarity with the flanking nucleotide regions of the GRE. We generated a computer model of the GR DBD using atomic coordinates derived from nuclear magnetic resonance spectroscopy to which we attached the exon 5-encoded putative alpha-helix. We docked this GR DBD structure at the 39-base-pair nucleotide sequence containing the GRE binding site and flanking nucleotides, which contained conserved genetic information. We observed that amino acids of the DNA recognition helix, the beta-strand, and the putative alpha-helix are spatially aligned with trinucleotides identical to their cognate codons within the GRE and its flanking nucleotides. Images Fig. 3 PMID:8516297

  6. Characterization of EBV Promoters and Coding Regions by Sequencing PCR-Amplified DNA Fragments.

    PubMed

    Szenthe, Kalman; Bánáti, Ferenc

    2017-01-01

    DNA sequencing approaches originally developed in two directions, the chemical degradation method and the chain-termination method. The latter one became more widespread and a huge amount of sequencing data including whole genome sequences accumulated, based on the use of capillary sequencer systems and the application of a modified chain-termination method which proved to be relatively easy, fast, and reliable. In addition, relatively long, up to 1000 bp sequences could be obtained with a single read with high per-base accuracy. Although the recent appearance of next-generation DNA sequencing (NGS) technologies enabled high-throughput and low cost analysis of DNA, the modified chain-terminating methods are often applied in research until now. In the following, we shall present the application of capillary sequencing for the sequence characterization of viral genomes in case of partial and whole genome sequencing, and demonstrate it on the BARF1 promoter of Epstein Barr virus (EBV).

  7. DNA sequencing and bar-coding using solid-state nanopores.

    PubMed

    Atas, Evrim; Singer, Alon; Meller, Amit

    2012-12-01

    Nanopores have emerged as a prominent single-molecule analytic tool with particular promise for genomic applications. In this review, we discuss two potential applications of the nanopore sensors: First, we present a nanopore-based single-molecule DNA sequencing method that utilizes optical detection for massively parallel throughput. Second, we describe a method by which nanopores can be used as single-molecule genotyping tools. For DNA sequencing, the distinction among the four types of DNA nucleobases is achieved by employing a biochemical procedure for DNA expansion. In this approach, each nucleobase in each DNA strand is converted into one of four predefined unique 16-mers in a process that preserves the nucleobase sequence. The resulting converted strands are then hybridized to a library of four molecular beacons, each carrying a unique fluorophore tag, that are perfect complements to the 16-mers used for conversion. Solid-state nanopores are then used to sequentially remove these beacons, one after the other, leading to a series of photon bursts in four colors that can be optically detected. Single-molecule genotyping is achieved by tagging the DNA fragments with γ-modified synthetic peptide nucleic acid probes coupled to an electronic characterization of the complexes using solid-state nanopores. This method can be used to identify and differentiate genes with a high level of sequence similarity at the single-molecule level, but different pathology or response to treatment. We will illustrate this method by differentiating the pol gene for two highly similar human immunodeficiency virus subtypes, paving the way for a novel diagnostics platform for viral classification. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Coding of DNA samples and data in the pharmaceutical industry: current practices and future directions--perspective of the I-PWG.

    PubMed

    Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A

    2011-04-01

    DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.

  9. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  10. The non-coding B2 RNA binds to the DNA cleft and active site region of RNA polymerase II

    PubMed Central

    Ponicsan, Steven L.; Houel, Stephane; Old, William M.; Ahn, Natalie G.; Goodrich, James A.; Kugel, Jennifer F.

    2013-01-01

    The B2 family of short interspersed elements is transcribed into non-coding RNA by RNA polymerase III. The ~180 nt B2 RNA has been shown to potently repress mRNA transcription by binding tightly to RNA polymerase II (Pol II) and assembling with it into complexes on promoter DNA, where it keeps the polymerase from properly engaging the promoter DNA. Mammalian Pol II is a ~500 kD complex that contains 12 different protein subunits, providing many possible surfaces for interaction with B2 RNA. We found that the carboxy-terminal domain of the largest Pol II subunit was not required for B2 RNA to bind Pol II and repress transcription in vitro. To identify the surface on Pol II to which the minimal functional region of B2 RNA binds, we coupled multi-step affinity purification, reversible formaldehyde crosslinking, peptide sequencing by mass spectrometry, and analysis of peptide enrichment. The Pol II peptides most highly recovered after crosslinking to B2 RNA mapped to the DNA binding cleft and active site region of Pol II. These studies determine the location of a defined nucleic acid binding site on a large, native, multi-subunit complex and provide insight into the mechanism of transcriptional repression by B2 RNA. PMID:23416138

  11. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  12. An Abundant Class of Non-coding DNA Can Prevent Stochastic Gene Silencing in the C. elegans Germline.

    PubMed

    Frøkjær-Jensen, Christian; Jain, Nimit; Hansen, Loren; Davis, M Wayne; Li, Yongbin; Zhao, Di; Rebora, Karine; Millet, Jonathan R M; Liu, Xiao; Kim, Stuart K; Dupuy, Denis; Jorgensen, Erik M; Fire, Andrew Z

    2016-07-14

    Cells benefit from silencing foreign genetic elements but must simultaneously avoid inactivating endogenous genes. Although chromatin modifications and RNAs contribute to maintenance of silenced states, the establishment of silenced regions will inevitably reflect underlying DNA sequence and/or structure. Here, we demonstrate that a pervasive non-coding DNA feature in Caenorhabditis elegans, characterized by 10-base pair periodic An/Tn-clusters (PATCs), can license transgenes for germline expression within repressive chromatin domains. Transgenes containing natural or synthetic PATCs are resistant to position effect variegation and stochastic silencing in the germline. Among endogenous genes, intron length and PATC-character undergo dramatic changes as orthologs move from active to repressive chromatin over evolutionary time, indicating a dynamic character to the An/Tn periodicity. We propose that PATCs form the basis of a cellular immune system, identifying certain endogenous genes in heterochromatic contexts as privileged while foreign DNA can be suppressed with no requirement for a cellular memory of prior exposure.

  13. Population dynamics coded in DNA: genetic traces of the expansion of modern humans

    NASA Astrophysics Data System (ADS)

    Kimmel, Marek

    1999-12-01

    It has been proposed that modern humans evolved from a small ancestral population, which appeared several hundred thousand years ago in Africa. Descendants of the founder group migrated to Europe and then to Asia, not mixing with the pre-existing local populations but replacing them. Two demographic elements are present in this “out of Africa” hypothesis: numerical growth of the modern humans and their migration into Eurasia. Did these processes leave an imprint in our DNA? To address this question, we use the classical Fisher-Wright-Moran model of population genetics, assuming variable population size and two models of mutation: the infinite-sites model and the stepwise-mutation model. We use the coalescence theory, which amounts to tracing the common ancestors of contemporary genes. We obtain mathematical formulae expressing the distribution of alleles given the time changes of population size . In the framework of the infinite-sites model, simulations indicate that the pattern of past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mitochondrial DNA sequences indicates that the current mitochondrial DNA sequence variation is not inconsistent with the logistic growth of the modern human population. In the framework of the stepwise-mutation model, we demonstrate that population bottleneck followed by growth in size causes an imbalance between allele-size variance and heterozygosity. We analyze a set of data on tetranucleotide repeats which reveals the existence of this imbalance. The pattern of imbalance is consistent with the bottleneck being most ancient in Africans, most recent in Asians and intermediate in Europeans. These findings are consistent with the “out of Africa” hypothesis, although by no means do they constitute its proof.

  14. Bar-coded, multiplexed sequencing of targeted DNA regions using the Illumina Genome Analyzer.

    PubMed

    Szelinger, Szabolcs; Kurdoglu, Ahmet; Craig, David W

    2011-01-01

    To date, genome-wide association (GWA) studies, in which thousands of markers throughout the genome are simultaneously genotyped, have identified hundreds of loci underlying disease susceptibility. These regions typically span 5-100 kb, and resequencing efforts to identify potential functional variants within these loci represent the next logical step in the genetic characterization pipeline. Next-generation DNA sequencing technologies are, in principle, well-suited for this task, yet despite the massive sequencing capability afforded by these platforms, the present-day reality is that it remains difficult, time-consuming, and expensive to resequence large numbers of samples across moderately sized genomic regions. To address this obstacle, we developed a generalized framework for multiplexed resequencing of targeted regions of the human genome on the Illumina Genome Analyzer using degenerate, indexed DNA sequence barcodes ligated to fragmented DNA prior to sequencing. Using this method, the DNA of multiple individuals can be simultaneously sequenced at several regions. We find that achieving adequate coverage is one of the most important factors in the design of an experiment, but other key considerations include whether the objective is to discover genetic variants for genotyping later by a separate method, to genotype all identified variants by sequencing, or to exhaustively identify all common and rare variants in the region. Given the massive bandwidth of next-generation sequencing technologies and their low inherent throughput in terms of sequencing arrays per week, multiplexed sequencing using the barcoding approach offers a clear mechanism for focusing bandwidth to a smaller region across many more individuals or samples.

  15. Fine-tuning the ubiquitin code at DNA double-strand breaks: deubiquitinating enzymes at work

    PubMed Central

    Citterio, Elisabetta

    2015-01-01

    Ubiquitination is a reversible protein modification broadly implicated in cellular functions. Signaling processes mediated by ubiquitin (ub) are crucial for the cellular response to DNA double-strand breaks (DSBs), one of the most dangerous types of DNA lesions. In particular, the DSB response critically relies on active ubiquitination by the RNF8 and RNF168 ub ligases at the chromatin, which is essential for proper DSB signaling and repair. How this pathway is fine-tuned and what the functional consequences are of its deregulation for genome integrity and tissue homeostasis are subject of intense investigation. One important regulatory mechanism is by reversal of substrate ubiquitination through the activity of specific deubiquitinating enzymes (DUBs), as supported by the implication of a growing number of DUBs in DNA damage response processes. Here, we discuss the current knowledge of how ub-mediated signaling at DSBs is controlled by DUBs, with main focus on DUBs targeting histone H2A and on their recent implication in stem cell biology and cancer. PMID:26442100

  16. Cloning and characterization of a cDNA coding for mouse placental alkaline phosphatase

    SciTech Connect

    Terao, M.; Mintz, B.

    1987-10-01

    Mouse alkaline phosphatase was partially purified from placenta. Data obtained by immunoblotting analysis suggested that the primary structure of this enzyme has a much greater homology to that of human and bovine liver ALPs than to the human placental isozyme. Therefore, a full-length cDNA encoding human liver-type ALP was used as a probe to isolate the mouse placental ALP cDNA. The cloned mouse cDNA is 2459 base pairs long and is composed of an open reading frame encoding a 524-amino acid polypeptide that contains a putative signal peptide of 17 amino acids. Homology at the amino acid level of the mouse placental ALP is 90% to the human liver isozyme but only 55% to the human placental counterpart. RNA blot hybridization results indicate that the mouse placental ALP is encoded by a gene identical to the gene expressed in mouse liver, kidney, and teratocarcinoma stem cells. This gene is therefore evolutionarily highly conserved in mouse and human.

  17. African swine fever virus ORF P1192R codes for a functional type II DNA topoisomerase.

    PubMed

    Coelho, João; Martins, Carlos; Ferreira, Fernando; Leitão, Alexandre

    2015-01-01

    Topoisomerases modulate the topological state of DNA during processes, such as replication and transcription, that cause overwinding and/or underwinding of the DNA. African swine fever virus (ASFV) is a nucleo-cytoplasmic double-stranded DNA virus shown to contain an OFR (P1192R) with homology to type II topoisomerases. Here we observed that pP1192R is highly conserved among ASFV isolates but dissimilar from other viral, prokaryotic or eukaryotic type II topoisomerases. In both ASFV/Ba71V-infected Vero cells and ASFV/L60-infected pig macrophages we detected pP1192R at intermediate and late phases of infection, cytoplasmically localized and accumulating in the viral factories. Finally, we used a Saccharomyces cerevisiae temperature-sensitive strain in order to demonstrate, through complementation and in vitro decatenation assays, the functionality of P1192R, which we further confirmed by mutating its predicted catalytic residue. Overall, this work strengthens the idea that P1192R constitutes a target for studying, and possibly controlling, ASFV transcription and replication.

  18. HGSA DNA day essay contest winner 60 years on: still coding for cutting-edge science.

    PubMed

    Yates, Patrick

    2013-08-01

    MESSAGE FROM THE EDUCATION COMMITTEE: In 2013, the Education Committee of the Human Genetics Society of Australasia (HGSA) established the DNA Day Essay Contest in Australia and New Zealand. The contest was first established by the American Society of Human Genetics in 2005 and the HGSA DNA Day Essay Contest is adapted from this contest via a collaborative partnership. The aim of the contest is to engage high school students with important concepts in genetics through literature research and reflection. As 2013 marks the 60th anniversary of the discovery of the double helix of DNA by James Watson and Francis Crick and the 10th anniversary of the first sequencing of the human genome, the essay topic was to choose either of these breakthroughs and explain its broader impact on biotechnology, human health and disease, or our understanding of basic genetics, such as genetic variation or gene expression. The contest attracted 87 entrants in 2013, with the winning essay authored by Patrick Yates, a Year 12 student from Melbourne High School. Further details about the contest including the names and schools of the other finalists can be found at http://www.hgsa-essay.net.au/. The Education Committee would like to thank all the 2013 applicants and encourage students to enter in 2014.

  19. Improving Installation Level Classified Information Protection Programs.

    DTIC Science & Technology

    1987-04-01

    1987 Api ni 16. SUPPLEMENTARY NOTATION ITEM 11: CLASSIFIED INFORMATION PROTECTION-PROGRAMS 17. COSATI CODES 18. SUBJECT TERMS ( Continue on rwuerse if...nectmary and identify by block number) S FIELD GROUP SUB. GR. % 19. ABSTRACT ( Continue on reverse if necessary and identify by block numbero Recent DoD...USAF installation level classified information protection programs. II. BACKGROUND. Recent unauthorized disclosures of classified information to the

  20. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system

    PubMed Central

    Kawano, Tomonori

    2013-01-01

    There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303

  1. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

    PubMed

    Kabir, Muhammad; Hayat, Maqsood

    2016-02-01

    Meiotic recombination is vital for maintaining the sequence diversity in human genome. Meiosis and recombination are considered the essential phases of cell division. In meiosis, the genome is divided into equal parts for sexual reproduction whereas in recombination, the diverse genomes are combined to form new combination of genetic variations. Recombination process does not occur randomly across the genomes, it targets specific areas called recombination "hotspots" and "coldspots". Owing to huge exploration of polygenetic sequences in data banks, it is impossible to recognize the sequences through conventional methods. Looking at the significance of recombination spots, it is indispensable to develop an accurate, fast, robust, and high-throughput automated computational model. In this model, the numerical descriptors are extracted using two sequence representation schemes namely: dinucleotide composition and trinucleotide composition. The performances of seven classification algorithms were investigated. Finally, the predicted outcomes of individual classifiers are fused to form ensemble classification, which is formed through majority voting and genetic algorithm (GA). The performance of GA-based ensemble model is quite promising compared to individual classifiers and majority voting-based ensemble model. iRSpot-GAEnsC has achieved 84.46 % accuracy. The empirical results revealed that the performance of iRSpot-GAEnsC is not only higher than the examined algorithms but also better than existing methods in the literature developed so far. It is anticipated that the proposed model might be helpful for research community, academia and for drug discovery.

  2. Flanking sequence specificity determines coding microsatellite heteroduplex and mutation rates with defective DNA mismatch repair (MMR).

    PubMed

    Chung, H; Lopez, C G; Young, D J; Lai, J F; Holmstrom, J; Ream-Robinson, D; Cabrera, B L; Carethers, J M

    2010-04-15

    The activin type II receptor (ACVR2) contains two identical microsatellites in exons 3 and 10, but only the exon 10 microsatellite is frameshifted in mismatch repair (MMR)-defective colonic tumors. The reason for this selectivity is not known. We hypothesized that ACVR2 frameshifts were influenced by DNA sequences surrounding the microsatellite. We constructed plasmids in which exons 3 or 10 of ACVR2 were cloned +1 bp out of frame of enhanced green fluorescent protein (EGFP), allowing -1 bp frameshift to express EGFP. Plasmids were stably transfected into MMR-deficient cells, and subsequent non-fluorescent cells were sorted, cultured and harvested for mutation analysis. We swapped DNA sequences flanking the exon 3 and 10 microsatellites to test our hypothesis. Native ACVR2 exon 3 and 10 microsatellites underwent heteroduplex formation (A(7)/T(8)) in hMLH1(-/-) cells, but only exon 10 microsatellites fully mutated (A(7)/T(7)) in both hMLH1(-/-) and hMSH6(-/-) backgrounds, showing selectivity for exon 10 frameshifts and inability of exon 3 heteroduplexes to fully mutate. Substituting nucleotides flanking the exon 3 microsatellite for nucleotides flanking the exon 10 microsatellite significantly reduced heteroduplex and full mutation in hMLH1(-/-) cells. When the exon 3 microsatellite was flanked by nucleotides normally surrounding the exon 10 microsatellite, fully mutant exon 3 frameshifts appeared. Mutation selectivity for ACVR2 lies partly with flanking nucleotides surrounding each microsatellite.

  3. Isolation and identification of a cDNA clone coding for an HLA-DR transplantation antigen alpha-chain.

    PubMed

    Gustafsson, K; Bill, P; Larhammar, D; Wiman, K; Claesson, L; Schenning, L; Servenius, B; Sundelin, J; Rask, L; Peterson, P A

    1982-10-01

    Membrane-bound mRNA was isolated from Raji cells and enriched for message coding for the HLA-DR transplantation antigen alpha-chain by sucrose gradient centrifugation. Double-stranded cDNA was constructed from this mRNA fraction, ligated to plasmid pBR322, and cloned into Escherichia coli. By hybrid selection, a plasmid, pDR-alpha-1, able to hybridize with mRNA coding for the HLA-DR alpha-chain was identified. From the nucleotide sequence of one end of the insert an amino acid sequence was predicted which is identical to part of the amino-terminal sequence of an HLA-DR alpha-chain preparation isolated from Raji cells. This clearly shows that pDR-alpha-1 carries almost the complete message for an HLD-DR alpha-chain. From the nucleotide sequence of this plasmid it will be possible to predict the primary structure of an HLA-DR alpha-chain.

  4. A positive detecting code and its decoding algorithm for DNA library screening.

    PubMed

    Uehara, Hiroaki; Jimbo, Masakazu

    2009-01-01

    The study of gene functions requires high-quality DNA libraries. However, a large number of tests and screenings are necessary for compiling such libraries. We describe an algorithm for extracting as much information as possible from pooling experiments for library screening. Collections of clones are called pools, and a pooling experiment is a group test for detecting all positive clones. The probability of positiveness for each clone is estimated according to the outcomes of the pooling experiments. Clones with high chance of positiveness are subjected to confirmatory testing. In this paper, we introduce a new positive clone detecting algorithm, called the Bayesian network pool result decoder (BNPD). The performance of BNPD is compared, by simulation, with that of the Markov chain pool result decoder (MCPD) proposed by Knill et al. in 1996. Moreover, the combinatorial properties of pooling designs suitable for the proposed algorithm are discussed in conjunction with combinatorial designs and d-disjunct matrices. We also show the advantage of utilizing packing designs or BIB designs for the BNPD algorithm.

  5. Variable continental distribution of polymorphisms in the coding regions of DNA-repair genes.

    PubMed

    Mathonnet, Géraldine; Labuda, Damian; Meloche, Caroline; Wambach, Tina; Krajinovic, Maja; Sinnett, Daniel

    2003-01-01

    DNA-repair pathways are critical for maintaining the integrity of the genetic material by protecting against mutations due to exposure-induced damages or replication errors. Polymorphisms in the corresponding genes may be relevant in genetic epidemiology by modifying individual cancer susceptibility or therapeutic response. We report data on the population distribution of potentially functional variants in XRCC1, APEX1, ERCC2, ERCC4, hMLH1, and hMSH3 genes among groups representing individuals of European, Middle Eastern, African, Southeast Asian and North American descent. The data indicate little interpopulation differentiation in some of these polymorphisms and typical FST values ranging from 10 to 17% at others. Low FST was observed in APEX1 and hMSH3 exon 23 in spite of their relatively high minor allele frequencies, which could suggest the effect of balancing selection. In XRCC1, hMSH3 exon 21 and hMLH1 Africa clusters either with Middle East and Europe or with Southeast Asia, which could be related to the demographic history of human populations, whereby human migrations and genetic drift rather than selection would account for the observed differences.

  6. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    DOE PAGES

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; ...

    2017-05-03

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less

  7. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis.

    PubMed

    Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin

    2006-11-01

    The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.

  8. Genotyping human ancient mtDNA control and coding region polymorphisms with a multiplexed Single-Base-Extension assay: the singular maternal history of the Tyrolean Iceman.

    PubMed

    Endicott, Phillip; Sanchez, Juan J; Pichler, Irene; Brotherton, Paul; Brooks, Jerome; Egarter-Vigl, Eduard; Cooper, Alan; Pramstaller, Peter

    2009-06-19

    Progress in the field of human ancient DNA studies has been severely restricted due to the myriad sources of potential contamination, and because of the pronounced difficulty in identifying authentic results. Improving the robustness of human aDNA results is a necessary pre-requisite to vigorously testing hypotheses about human evolution in Europe, including possible admixture with Neanderthals. This study approaches the problem of distinguishing between authentic and contaminating sequences from common European mtDNA haplogroups by applying a multiplexed Single-Base-Extension assay, containing both control and coding region sites, to DNA extracted from the Tyrolean Iceman. The multiplex assay developed for this study was able to confirm that the Iceman's mtDNA belongs to a new European mtDNA clade with a very limited distribution amongst modern data sets. Controlled contamination experiments show that the correct results are returned by the multiplex assay even in the presence of substantial amounts of exogenous DNA. The overall level of discrimination achieved by targeting both control and coding region polymorphisms in a single reaction provides a methodology capable of dealing with most cases of homoplasy prevalent in European haplogroups. The new genotyping results for the Iceman confirm the extreme fallibility of human aDNA studies in general, even when authenticated by independent replication. The sensitivity and accuracy of the multiplex Single-Base-Extension methodology forms part of an emerging suite of alternative techniques for the accurate retrieval of ancient DNA sequences from both anatomically modern humans and Neanderthals. The contamination of laboratories remains a pressing concern in aDNA studies, both in the pre and post-PCR environments, and the adoption of a forensic style assessment of a priori risks would significantly improve the credibility of results.

  9. Role of conserved non-coding DNA elements in the Foxp3 gene in regulatory T-cell fate.

    PubMed

    Zheng, Ye; Josefowicz, Steven; Chaudhry, Ashutosh; Peng, Xiao P; Forbush, Katherine; Rudensky, Alexander Y

    2010-02-11

    Immune homeostasis is dependent on tight control over the size of a population of regulatory T (T(reg)) cells capable of suppressing over-exuberant immune responses. The T(reg) cell subset is comprised of cells that commit to the T(reg) lineage by upregulating the transcription factor Foxp3 either in the thymus (tT(reg)) or in the periphery (iT(reg)). Considering a central role for Foxp3 in T(reg) cell differentiation and function, we proposed that conserved non-coding DNA sequence (CNS) elements at the Foxp3 locus encode information defining the size, composition and stability of the T(reg) cell population. Here we describe the function of three Foxp3 CNS elements (CNS1-3) in T(reg) cell fate determination in mice. The pioneer element CNS3, which acts to potently increase the frequency of T(reg) cells generated in the thymus and the periphery, binds c-Rel in in vitro assays. In contrast, CNS1, which contains a TGF-beta-NFAT response element, is superfluous for tT(reg) cell differentiation, but has a prominent role in iT(reg) cell generation in gut-associated lymphoid tissues. CNS2, although dispensable for Foxp3 induction, is required for Foxp3 expression in the progeny of dividing T(reg) cells. Foxp3 binds to CNS2 in a Cbf-beta-Runx1 and CpG DNA demethylation-dependent manner, suggesting that Foxp3 recruitment to this 'cellular memory module' facilitates the heritable maintenance of the active state of the Foxp3 locus and, therefore, T(reg) lineage stability. Together, our studies demonstrate that the composition, size and maintenance of the T(reg) cell population are controlled by Foxp3 CNS elements engaged in response to distinct cell-extrinsic or -intrinsic cues.

  10. DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids.

    PubMed

    Hafeez, Ibbad; Khan, Asifullah; Qadir, Abdul

    2014-11-01

    Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://111.68.99.218/DNA-LCEB.

  11. DNA vaccine coding for the rhesus prostate specific antigen delivered by intradermal electroporation in patients with relapsed prostate cancer.

    PubMed

    Eriksson, Fredrik; Tötterman, Thomas; Maltais, Anna-Karin; Pisa, Pavel; Yachnin, Jeffrey

    2013-08-20

    We tested safety, clinical efficacy and immunogenicity of a DNA vaccine coding for rhesus prostate specific antigen (PSA) delivered by intradermal injection and skin electroporation. Fifteen patients with biochemical relapse of prostate cancer without macroscopic disease participated in this phase I study. Patients were started on a 1 month course of androgen deprivation therapy (ADT) prior to treatment. Vaccine doses ranged from 50 to 1,600 μg. Study subjects received five vaccinations at four week intervals. All patients have had at least one year of follow-up. No systemic toxicity was observed. Discomfort from electroporation did not require analgesia or topical anesthetic. No clinically significant changes in PSA kinetics were observed as all patients required antiandrogen therapy shortly after completion of the 5 months of vaccination due to rising PSA. Immunogenicity, as measured by T-cell reactivity to the modified PSA peptide and to a mix of overlapping PSA peptides representing the full length protein, was observed in some patients. All but one patient had pre-study PSA specific T-cell reactivity. ADT alone resulted in increases in T-cell reactivity in most patients. Intradermal vaccination with skin electroporation is easily performed with only minor discomfort for the patient. Patients with biochemical relapse of prostate cancer are a good model for testing immune therapies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Evolutionary Conservation of a Coding Function for D4Z4, the Tandem DNA Repeat Mutated in Facioscapulohumeral Muscular Dystrophy

    PubMed Central

    Clapp, Jannine ; Mitchell, Laura M. ; Bolland, Daniel J. ; Fantes, Judy ; Corcoran, Anne E. ; Scotting, Paul J. ; Armour, John A. L. ; Hewitt, Jane E. 

    2007-01-01

    Facioscapulohumeral muscular dystrophy (FSHD) is caused by deletions within the polymorphic DNA tandem array D4Z4. Each D4Z4 repeat unit has an open reading frame (ORF), termed “DUX4,” containing two homeobox sequences. Because there has been no evidence of a transcript from the array, these deletions are thought to cause FSHD by a position effect on other genes. Here, we identify D4Z4 homologues in the genomes of rodents, Afrotheria (superorder of elephants and related species), and other species and show that the DUX4 ORF is conserved. Phylogenetic analysis suggests that primate and Afrotherian D4Z4 arrays are orthologous and originated from a retrotransposed copy of an intron-containing DUX gene, DUXC. Reverse-transcriptase polymerase chain reaction and RNA fluorescence and tissue in situ hybridization data indicate transcription of the mouse array. Together with the conservation of the DUX4 ORF for >100 million years, this strongly supports a coding function for D4Z4 and necessitates re-examination of current models of the FSHD disease mechanism. PMID:17668377

  13. A non-coding plastid DNA phylogeny of Asian Begonia (Begoniaceae): evidence for morphological homoplasy and sectional polyphyly.

    PubMed

    Thomas, D C; Hughes, M; Phutthai, T; Rajbhandary, S; Rubite, R; Ardi, W H; Richardson, J E

    2011-09-01

    Maximum likelihood and Bayesian analyses of non-coding plastid DNA sequence data based on a broad sampling of all major Asian Begonia sections (ndhA intron, ndhF-rpl32 spacer, rpl32-trnL spacer, 3977 aligned characters, 84 species) were used to reconstruct the phylogeny of Asian Begonia and to test the monophyly of major Asian Begonia sections. Ovary and fruit characters which are crucial in current sectional circumscriptions were mapped on the phylogeny to assess their utility in infrageneric classifications. The results indicate that the strong systematic emphasis placed on single, homoplasious characters such as undivided placenta lamellae (section Reichenheimia) and fleshy pericarps (section Sphenanthera), and the recognition of sections primarily based on a suite of plesiomorphic characters including three-locular ovaries with axillary, bilamellate placentae and dry, dehiscent pericarps (section Diploclinium), has resulted in the circumscription of several polyphyletic sections. Moreover, sections Platycentrum and Petermannia were recovered as paraphyletic. Because of the homoplasy of systematically important characters, current classifications have a certain diagnostic, but only poor predictive value. The presented phylogeny provides for the first time a reasonably resolved and supported phylogenetic framework for Asian Begonia which has the power to inform future taxonomic, biogeographic and evolutionary studies.

  14. Brut: Automatic bubble classifier

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, Alyssa; Williams, Jonathan; Kendrew, Sarah; Simpson, Robert

    2014-07-01

    Brut, written in Python, identifies bubbles in infrared images of the Galactic midplane; it uses a database of known bubbles from the Milky Way Project and Spitzer images to build an automatic bubble classifier. The classifier is based on the Random Forest algorithm, and uses the WiseRF implementation of this algorithm.

  15. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense.

  16. New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

    PubMed Central

    Černý, Viktor; Carracedo, Ángel

    2011-01-01

    Background Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. Methodology/Principal Findings Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. Conclusions/Significance Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mt

  17. Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

    PubMed Central

    LaPolla, R J; Mayne, K M; Davidson, N

    1984-01-01

    A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870

  18. The Arabidopsis HOMOLOGY-DEPENDENT GENE SILENCING1 Gene Codes for an S-Adenosyl-l-Homocysteine Hydrolase Required for DNA Methylation-Dependent Gene Silencing

    PubMed Central

    Rocha, Pedro S.C.F.; Sheikh, Mazhar; Melchiorre, Rosalba; Fagard, Mathilde; Boutet, Stéphanie; Loach, Rebecca; Moffatt, Barbara; Wagner, Conrad; Vaucheret, Hervé; Furner, Ian

    2005-01-01

    Genes introduced into higher plant genomes can become silent (gene silencing) and/or cause silencing of homologous genes at unlinked sites (homology-dependent gene silencing or HDG silencing). Mutations of the HOMOLOGY-DEPENDENT GENE SILENCING1 (HOG1) locus relieve transcriptional gene silencing and methylation-dependent HDG silencing and result in genome-wide demethylation. The hog1 mutant plants also grow slowly and have low fertility and reduced seed germination. Three independent mutants of HOG1 were each found to have point mutations at the 3′ end of a gene coding for S-adenosyl-l-homocysteine (SAH) hydrolase, and hog1-1 plants show reduced SAH hydrolase activity. A transposon (hog1-4) and a T-DNA tag (hog1-5) in the HOG1 gene each behaved as zygotic embryo lethal mutants and could not be made homozygous. The results suggest that the homozygous hog1 point mutants are leaky and result in genome demethylation and poor growth and that homozygous insertion mutations result in zygotic lethality. Complementation of the hog1-1 point mutation with a T-DNA containing the gene coding for SAH hydrolase restored gene silencing, HDG silencing, DNA methylation, fast growth, and normal seed viability. The same T-DNA also complemented the zygotic embryo lethal phenotype of the hog1-4 tagged mutant. A model relating the HOG1 gene, DNA methylation, and methylation-dependent HDG silencing is presented. PMID:15659630

  19. Dynamic system classifier

    NASA Astrophysics Data System (ADS)

    Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A.

    2016-07-01

    Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω (t ) and damping factor γ (t ) . Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.

  20. Dynamic system classifier.

    PubMed

    Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A

    2016-07-01

    Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω(t) and damping factor γ(t). Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.

  1. Hierarchical Pattern Classifier

    NASA Technical Reports Server (NTRS)

    Yates, Gigi L.; Eberlein, Susan J.

    1992-01-01

    Hierarchical pattern classifier reduces number of comparisons between input and memory vectors without reducing detail of final classification by dividing classification process into coarse-to-fine hierarchy that comprises first "grouping" step and second classification step. Three-layer neural network reduces computation further by reducing number of vector dimensions in processing. Concept applicable to pattern-classification problems with need to reduce amount of computation necessary to classify, identify, or match patterns to desired degree of resolution.

  2. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available.

  3. Detecting Selection in the Blue Crab, Callinectes sapidus, Using DNA Sequence Data from Multiple Nuclear Protein-Coding Genes

    PubMed Central

    Yednock, Bree K.; Neigel, Joseph E.

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available. PMID:24896825

  4. Cellulases and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  5. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  6. C.U.R.R.F. (Codon Usage regarding Restriction Finder): a free Java(®)-based tool to detect potential restriction sites in both coding and non-coding DNA sequences.

    PubMed

    Gatter, Michael; Gatter, Thomas; Matthäus, Falk

    2012-10-01

    The synthesis of complete genes is becoming a more and more popular approach in heterologous gene expression. Reasons for this are the decreasing prices and the numerous advantages in comparison to classic molecular cloning methods. Two of these advantages are the possibility to adapt the codon usage to the host organism and the option to introduce restriction enzyme target sites of choice. C.U.R.R.F. (Codon Usage regarding Restriction Finder) is a free Java(®)-based software program which is able to detect possible restriction sites in both coding and non-coding DNA sequences by introducing multiple silent or non-silent mutations, respectively. The deviation of an alternative sequence containing a desired restriction motive from the sequence with the optimal codon usage is considered during the search of potential restriction sites in coding DNA and mRNA sequences as well as protein sequences. C.U.R.R.F is available at http://www.zvm.tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_mathematik_und_naturwissenschaften/fachrichtung_biologie/mikrobiologie/allgemeine_mikrobiologie/currf.

  7. Study characterizes long non-coding RNA’s response to DNA damage in colon cancer cells | Center for Cancer Research

    Cancer.gov

    Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer cells.  Read more...

  8. Homological stabilizer codes

    SciTech Connect

    Anderson, Jonas T.

    2013-03-15

    In this paper we define homological stabilizer codes on qubits which encompass codes such as Kitaev's toric code and the topological color codes. These codes are defined solely by the graphs they reside on. This feature allows us to use properties of topological graph theory to determine the graphs which are suitable as homological stabilizer codes. We then show that all toric codes are equivalent to homological stabilizer codes on 4-valent graphs. We show that the topological color codes and toric codes correspond to two distinct classes of graphs. We define the notion of label set equivalencies and show that under a small set of constraints the only homological stabilizer codes without local logical operators are equivalent to Kitaev's toric code or to the topological color codes. - Highlights: Black-Right-Pointing-Pointer We show that Kitaev's toric codes are equivalent to homological stabilizer codes on 4-valent graphs. Black-Right-Pointing-Pointer We show that toric codes and color codes correspond to homological stabilizer codes on distinct graphs. Black-Right-Pointing-Pointer We find and classify all 2D homological stabilizer codes. Black-Right-Pointing-Pointer We find optimal codes among the homological stabilizer codes.

  9. A homologue of the nuclear coded 49 kd subunit of bovine mitochondrial NADH-ubiquinone reductase is coded in chloroplast DNA.

    PubMed Central

    Fearnley, I M; Runswick, M J; Walker, J E

    1989-01-01

    The mitochondrial NADH-ubiquinone reductase (complex I) is an assembly of approximately 26 different polypeptides. In vertebrates and invertebrates, seven of its subunits are the products of genes in the mitochondrial DNA, and homologues of these genes have been found previously in the chloroplast genomes of Marchantia polymorpha and Nicotiana tabacum, although their function in the chloroplast is unknown. The remainder of the subunits of the mitochondrial complex are nuclear gene products that are imported into the organelle, amongst them the 49 kd subunit, a component of the iron--sulphur subcomplex of the enzyme. In the present work, the N-terminal sequence of this protein has been determined, and this has been used to design two mixtures of synthetic oligonucleotides, each containing 32 different sequences 17 bases long. These mixtures have been used as hybridization probes to isolate cDNA clones from a bovine library. The DNA sequences of these clones have been determined and they encode the mature 49 kd protein, with the exception of amino acids 1 and 2. The protein sequence of 430 amino acids is closely related to those of proteins that are encoded in open reading frames (ORFs) present in the chloroplast genomes of M.polymorpha and N.tabacum. Only one cysteine is conserved and the sequences provide no indication that the 49 kd protein contains iron--sulphur centres. These ORFs are found in the single copy regions of chloroplast DNA in close proximity to four of the homologues of the mammalian mitochondrial genes that encode subunits of complex I.(ABSTRACT TRUNCATED AT 250 WORDS) Images PMID:2498081

  10. Embedded feature ranking for ensemble MLP classifiers.

    PubMed

    Windeatt, Terry; Duangsoithong, Rakkrit; Smith, Raymond

    2011-06-01

    A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.

  11. Characterization of a cDNA clone coding for a mouse 85 kDa heat shock protein from a 3-methylcholanthrene-induced tumor

    SciTech Connect

    Moore, S.K.; Robinson, E.A.; Ullrich, S.J.; Appella, E.

    1986-05-01

    Heat shock proteins (hsp) of approx. 85 kDa have been found associated with steroid hormone receptors and with the src oncogene product. Recently, the authors have shown that a mouse tumor-associated transplantation antigen from a 3-methylcholanthrene-induced tumor shares amino acid sequence homology with the 85 kDa hsp. Amino acid sequence of peptides from this antigen were used to synthesize oligonucleotide probes to screen a mouse cDNA library. A cDNA clone coding for a 85 kDa mouse hsp has been isolated and its sequence determined. Predicted amino acid sequences from this cDNA clone share significant homology to the published 90 kDa hsp of Saccaromyces cerevisiae and to the 83 kDa hsp of Drosophila melanogaster. In addition, the predicted amino acid sequence at the carboxyl terminus shares identity with that of the 70 kDa hsp from various species as well as that of the Escherichia coli dnaK gene product. Northern blot analysis indicates that the mouse 85 kDa hsp is coded for by a kb mRNA. The size of the mRNA is indistinguishable between normal and malignant cells.

  12. Recognition Using Hybrid Classifiers.

    PubMed

    Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

    2016-04-01

    A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.

  13. HEXIM1 and NEAT1 Long Non-coding RNA Form a Multi-subunit Complex that Regulates DNA-Mediated Innate Immune Response.

    PubMed

    Morchikh, Mehdi; Cribier, Alexandra; Raffel, Raoul; Amraoui, Sonia; Cau, Julien; Severac, Dany; Dubois, Emeric; Schwartz, Olivier; Bennasser, Yamina; Benkirane, Monsef

    2017-08-03

    The DNA-mediated innate immune response underpins anti-microbial defenses and certain autoimmune diseases. Here we used immunoprecipitation, mass spectrometry, and RNA sequencing to identify a ribonuclear complex built around HEXIM1 and the long non-coding RNA NEAT1 that we dubbed the HEXIM1-DNA-PK-paraspeckle components-ribonucleoprotein complex (HDP-RNP). The HDP-RNP contains DNA-PK subunits (DNAPKc, Ku70, and Ku80) and paraspeckle proteins (SFPQ, NONO, PSPC1, RBM14, and MATRIN3). We show that binding of HEXIM1 to NEAT1 is required for its assembly. We further demonstrate that the HDP-RNP is required for the innate immune response to foreign DNA, through the cGAS-STING-IRF3 pathway. The HDP-RNP interacts with cGAS and its partner PQBP1, and their interaction is remodeled by foreign DNA. Remodeling leads to the release of paraspeckle proteins, recruitment of STING, and activation of DNAPKc and IRF3. Our study establishes the HDP-RNP as a key nuclear regulator of DNA-mediated activation of innate immune response through the cGAS-STING pathway. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  15. Classifying Cereal Data

    Cancer.gov

    The DSQ includes questions about cereal intake and allows respondents up to two responses on which cereals they consume. We classified each cereal reported first by hot or cold, and then along four dimensions: density of added sugars, whole grains, fiber, and calcium.

  16. Classifying Adolescent Perfectionists

    ERIC Educational Resources Information Center

    Rice, Kenneth G.; Ashby, Jeffrey S.; Gilman, Rich

    2011-01-01

    A large school-based sample of 9th-grade adolescents (N = 875) completed the Almost Perfect Scale-Revised (APS-R; Slaney, Mobley, Trippi, Ashby, & Johnson, 1996). Decision rules and cut-scores were developed and replicated that classify adolescents as one of two kinds of perfectionists (adaptive or maladaptive) or as nonperfectionists. A…

  17. Number in Classifier Languages

    ERIC Educational Resources Information Center

    Nomoto, Hiroki

    2013-01-01

    Classifier languages are often described as lacking genuine number morphology and treating all common nouns, including those conceptually count, as an unindividuated mass. This study argues that neither of these popular assumptions is true, and presents new generalizations and analyses gained by abandoning them. I claim that no difference exists…

  18. A sandwich-hybridization assay for simultaneous determination of HIV and tuberculosis DNA targets based on signal amplification by quantum dots-PowerVision™ polymer coding nanotracers.

    PubMed

    Yan, Zhongdan; Gan, Ning; Zhang, Huairong; Wang, De; Qiao, Li; Cao, Yuting; Li, Tianhua; Hu, Futao

    2015-09-15

    A novel sandwich-hybridization assay for simultaneous electrochemical detection of multiple DNA targets related to human immune deficiency virus (HIV) and tuberculosis (TB) was developed based on the different quantum dots-PowerVision(TM) polymer nanotracers. The polymer nanotracers were respectively fabricated by immobilizing SH-labeled oligonucleotides (s-HIV or s-TB), which can partially hybrid with virus DNA (HIV or TB), on gold nanoparticles (Au NPs) and then modified with PowerVision(TM) (PV) polymer-encapsulated quantum dots (CdS or PbS) as signal tags. PV is a dendrimer enzyme linked polymer, which can immobilize abundant QDs to amplify the stripping voltammetry signals from the metal ions (Pb or Cd). The capture probes were prepared through the immobilization of SH-labeled oligonucleotides, which can complementary with HIV and TB DNA, on the magnetic Fe3O4@Au (GMPs) beads. After sandwich-hybridization, the polymer nanotracers together with HIV and TB DNA targets were simultaneously introduced onto the surface of GMPs. Then the two encoding metal ions (Cd(2+) and Pb(2+)) were used to differentiate two viruses DNA due to the different subsequent anodic stripping voltammetric peaks at -0.84 V (Cd) and -0.61 V (Pb). Because of the excellent signal amplification of the polymer nanotracers and the great specificity of DNA targets, this assay could detect targets DNA as low as 0.2 femtomolar and exhibited excellent selectivity with the dynamitic range from 0.5 fM to 500 pM. Those results demonstrated that this electrochemical coding assay has great potential in applications for screening more viruses DNA while changing the probes.

  19. LCC: Light Curves Classifier

    NASA Astrophysics Data System (ADS)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  20. Identification of a cDNA clone that contains the complete coding sequence for a 140-kD rat NCAM polypeptide

    PubMed Central

    1987-01-01

    Neural cell adhesion molecules (NCAMs) are cell surface glycoproteins that appear to mediate cell-cell adhesion. In vertebrates NCAMs exist in at least three different polypeptide forms of apparent molecular masses 180, 140, and 120 kD. The 180- and 140-kD forms span the plasma membrane whereas the 120-kD form lacks a transmembrane region. In this study, we report the isolation of NCAM clones from an adult rat brain cDNA library. Sequence analysis indicated that the longest isolate, pR18, contains a 2,574 nucleotide open reading frame flanked by 208 bases of 5' and 409 bases of 3' untranslated sequence. The predicted polypeptide encoded by clone pR18 contains a single membrane-spanning region and a small cytoplasmic domain (120 amino acids), suggesting that it codes for a full-length 140-kD NCAM form. In Northern analysis, probes derived from 5' sequences of pR18, which presumably code for extracellular portions of the molecule hybridized to five discrete mRNA size classes (7.4, 6.7, 5.2, 4.3, and 2.9 kb) in adult rat brain but not to liver or muscle RNA. However, the 5.2- and 2.9-kb mRNA size classes did not hybridize to either a large restriction fragment or three oligonucleotides derived from the putative transmembrane coding region and regions that lie 3' to it. The 3' probes did hybridize to the 7.4-, 6.7-, and 4.3-kb message size classes. These combined results indicate that clone pR18 is derived from either the 7.4-, 6.7-, or 4.3- kb adult rat brain RNA size class. Comparison with chicken and mouse NCAM cDNA sequences suggests that pR18 represents the amino acid coding region of the 6.7- or 4.3-kb mRNA. The isolation of pR18, the first cDNA that contains the complete coding sequence of an NCAM polypeptide, unambiguously demonstrates the predicted linear amino acid sequence of this probable rat 140-kD polypeptide. This cDNA also contains a 30-base pair segment not found in NCAM cDNAs isolated from other species. The significance of this segment and other

  1. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  2. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea.

    PubMed

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods.

  3. Mitochondrial DNA of Clathrina clathrus (Calcarea, Calcinea): six linear chromosomes, fragmented rRNAs, tRNA editing, and a novel genetic code.

    PubMed

    Lavrov, Dennis V; Pett, Walker; Voigt, Oliver; Wörheide, Gert; Forget, Lise; Lang, B Franz; Kayal, Ehsan

    2013-04-01

    Sponges (phylum Porifera) are a large and ancient group of morphologically simple but ecologically important aquatic animals. Although their body plan and lifestyle are relatively uniform, sponges show extensive molecular and genetic diversity. In particular, mitochondrial genomes from three of the four previously studied classes of Porifera (Demospongiae, Hexactinellida, and Homoscleromorpha) have distinct gene contents, genome organizations, and evolutionary rates. Here, we report the mitochondrial genome of Clathrina clathrus (Calcinea, Clathrinidae), a representative of the fourth poriferan class, the Calcarea, which proves to be the most unusual. Clathrina clathrus mitochondrial DNA (mtDNA) consists of six linear chromosomes 7.6-9.4 kb in size and encodes at least 37 genes: 13 protein codings, 2 ribosomal RNAs (rRNAs), and 24 transfer RNAs (tRNAs). Protein genes include atp9, which has now been found in all major sponge lineages, but no atp8. Our analyses further reveal the presence of a novel genetic code that involves unique reassignments of the UAG codons from termination to tyrosine and of the CGN codons from arginine to glycine. Clathrina clathrus mitochondrial rRNAs are encoded in three (srRNA) and ≥6 (lrRNA) fragments distributed out of order and on several chromosomes. The encoded tRNAs contain multiple mismatches in the aminoacyl acceptor stems that are repaired posttranscriptionally by 3'-end RNA editing. Although our analysis does not resolve the phylogenetic position of calcareous sponges, likely due to their high rates of mitochondrial sequence evolution, it confirms mtDNA as a promising marker for population studies in this group. The combination of unusual mitochondrial features in C. clathrus redefines the extremes of mtDNA evolution in animals and further argues against the idea of a "typical animal mtDNA."

  4. Generalized classifier neural network.

    PubMed

    Ozyildirim, Buse Melis; Avci, Mutlu

    2013-03-01

    In this work a new radial basis function based classification neural network named as generalized classifier neural network, is proposed. The proposed generalized classifier neural network has five layers, unlike other radial basis function based neural networks such as generalized regression neural network and probabilistic neural network. They are input, pattern, summation, normalization and output layers. In addition to topological difference, the proposed neural network has gradient descent based optimization of smoothing parameter approach and diverge effect term added calculation improvements. Diverge effect term is an improvement on summation layer calculation to supply additional separation ability and flexibility. Performance of generalized classifier neural network is compared with that of the probabilistic neural network, multilayer perceptron algorithm and radial basis function neural network on 9 different data sets and with that of generalized regression neural network on 3 different data sets include only two classes in MATLAB environment. Better classification performance up to %89 is observed. Improved classification performances proved the effectivity of the proposed neural network.

  5. Structural and functional analysis of four non-coding Y RNAs from Chinese hamster cells: identification, molecular dynamics simulations and DNA replication initiation assays.

    PubMed

    de Lima Neto, Quirino Alves; Duarte Junior, Francisco Ferreira; Bueno, Paulo Sérgio Alves; Seixas, Flavio Augusto Vicente; Kowalski, Madzia Pauline; Kheir, Eyemen; Krude, Torsten; Fernandez, Maria Aparecida

    2016-01-05

    The genes coding for Y RNAs are evolutionarily conserved in vertebrates. These non-coding RNAs are essential for the initiation of chromosomal DNA replication in vertebrate cells. However thus far, no information is available about Y RNAs in Chinese hamster cells, which have already been used to detect replication origins and alternative DNA structures around these sites. Here, we report the gene sequences and predicted structural characteristics of the Chinese hamster Y RNAs, and analyze their ability to support the initiation of chromosomal DNA replication in vitro. We identified DNA sequences in the Chinese hamster genome of four Y RNAs (chY1, chY3, chY4 and chY5) with upstream promoter sequences, which are homologous to the four main types of vertebrate Y RNAs. The chY1, chY3 and chY5 genes were highly conserved with their vertebrate counterparts, whilst the chY4 gene showed a relatively high degree of diversification from the other vertebrate Y4 genes. Molecular dynamics simulations suggest that chY4 RNA is structurally stable despite its evolutionarily divergent predicted stem structure. Of the four Y RNA genes present in the hamster genome, we found that only the chY1 and chY3 RNA were strongly expressed in the Chinese hamster GMA32 cell line, while expression of the chY4 and chY5 RNA genes was five orders of magnitude lower, suggesting that they may in fact not be expressed. We synthesized all four chY RNAs and showed that any of these four could support the initiation of DNA replication in an established human cell-free system. These data therefore establish that non-coding chY RNAs are stable structures and can substitute for human Y RNAs in a reconstituted cell-free DNA replication initiation system. The pattern of Y RNA expression and functionality is consistent with Y RNAs of other rodents, including mouse and rat.

  6. TTS Mapping: integrative WEB tool for analysis of triplex formation target DNA Sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome

    PubMed Central

    2009-01-01

    Background DNA triplexes can naturally occur, co-localize and interact with many other regulatory DNA elements (e.g. G-quadruplex (G4) DNA motifs), specific DNA-binding proteins (e.g. transcription factors (TFs)), and micro-RNA (miRNA) precursors. Specific genome localizations of triplex target DNA sites (TTSs) may cause abnormalities in a double-helix DNA structure and can be directly involved in some human diseases. However, genome localization of specific TTSs, their interconnection with regulatory DNA elements and physiological roles in a cell are poor defined. Therefore, it is important to identify comprehensive and reliable catalogue of specific potential TTSs (pTTSs) and their co-localization patterns with other regulatory DNA elements in the human genome. Results "TTS mapping" database is a web-based search engine developed here, which is aimed to find and annotate pTTSs within a region of interest of the human genome. The engine provides descriptive statistics of pTTSs in a given region and its sequence context. Different annotation tracks of TTS-overlapping gene region(s), G4 motifs, CpG Island, miRNA precursors, miRNA targets, transcription factor binding sites (TFBSs), Single Nucleotide Polymorphisms (SNPs), small nucleolar RNAs (snoRNA), and repeat elements are also mapped based onto a sequence location provided by UCSC genome browser, G4 database http://www.quadruplex.org and several other datasets. The results pages provide links to UCSC genome browser annotation tracks and relative DBs. BLASTN program was included to check the uniqueness of a given pTTS in the human genome. Recombination- and mutation-prone genes (e.g. EVI-1, MYC) were found to be significantly enriched by TTSs and multiple co-occurring with our regulatory DNA elements. TTS mapping reveals that a high-complementary and evolutionarily conserved polypurine and polypyrimidine DNA sequence pair linked by a non-conserved short DNA sequence can form miR-483 transcribed from intron 2 of

  7. TTS mapping: integrative WEB tool for analysis of triplex formation target DNA sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Kuznetsov, Vladimir A

    2009-12-03

    DNA triplexes can naturally occur, co-localize and interact with many other regulatory DNA elements (e.g. G-quadruplex (G4) DNA motifs), specific DNA-binding proteins (e.g. transcription factors (TFs)), and micro-RNA (miRNA) precursors. Specific genome localizations of triplex target DNA sites (TTSs) may cause abnormalities in a double-helix DNA structure and can be directly involved in some human diseases. However, genome localization of specific TTSs, their interconnection with regulatory DNA elements and physiological roles in a cell are poor defined. Therefore, it is important to identify comprehensive and reliable catalogue of specific potential TTSs (pTTSs) and their co-localization patterns with other regulatory DNA elements in the human genome. "TTS mapping" database is a web-based search engine developed here, which is aimed to find and annotate pTTSs within a region of interest of the human genome. The engine provides descriptive statistics of pTTSs in a given region and its sequence context. Different annotation tracks of TTS-overlapping gene region(s), G4 motifs, CpG Island, miRNA precursors, miRNA targets, transcription factor binding sites (TFBSs), Single Nucleotide Polymorphisms (SNPs), small nucleolar RNAs (snoRNA), and repeat elements are also mapped based onto a sequence location provided by UCSC genome browser, G4 database http://www.quadruplex.org and several other datasets. The results pages provide links to UCSC genome browser annotation tracks and relative DBs. BLASTN program was included to check the uniqueness of a given pTTS in the human genome. Recombination- and mutation-prone genes (e.g. EVI-1, MYC) were found to be significantly enriched by TTSs and multiple co-occurring with our regulatory DNA elements. TTS mapping reveals that a high-complementary and evolutionarily conserved polypurine and polypyrimidine DNA sequence pair linked by a non-conserved short DNA sequence can form miR-483 transcribed from intron 2 of IGF2 gene and bound

  8. Molecular cloning and expression in photosynthetic bacteria of a soybean cDNA coding for phytoene desaturase, an enzyme of the carotenoid biosynthesis pathway.

    PubMed Central

    Bartley, G E; Viitanen, P V; Pecker, I; Chamovitz, D; Hirschberg, J; Scolnik, P A

    1991-01-01

    Carotenoids are orange, yellow, or red photo-protective pigments present in all plastids. The first carotenoid of the pathway is phytoene, a colorless compound that is converted into colored carotenoids through a series of desaturation reactions. Genes coding for carotenoid desaturases have been cloned from microbes but not from plants. We report the cloning of a cDNA for pds1, a soybean (Glycine max) gene that, based on a complementation assay using the photosynthetic bacterium Rhodobacter capsulatus, codes for an enzyme that catalyzes the two desaturation reactions that convert phytoene into zeta-carotene, a yellow carotenoid. The 2281-base-pair cDNA clone analyzed contains an open reading frame with the capacity to code for a 572-residue protein of predicted Mr 63,851. Alignment of the deduced Pds1 peptide sequence with the sequences of fungal and bacterial carotenoid desaturases revealed conservation of several amino acid residues, including a dinucleotide-binding motif that could mediate binding to FAD. The Pds1 protein is synthesized in vitro as a precursor that, upon import into isolated chloroplasts, is processed to a smaller mature form. Hybridization of the pds1 cDNA to genomic blots indicated that this gene is a member of a low-copy-number gene family. One of these loci was genetically mapped using restriction fragment length polymorphisms between Glycine max and Glycine soja. We conclude that pds1 is a nuclear gene encoding a phytoene desaturase enzyme that, as its microbial counterparts, contains sequence motifs characteristic of flavoproteins. Images PMID:1862081

  9. Identification of an androgen-repressed mRNA in rat ventral prostate as coding for sulphated glycoprotein 2 by cDNA cloning and sequence analysis.

    PubMed Central

    Bettuzzi, S; Hiipakka, R A; Gilna, P; Liao, S T

    1989-01-01

    The concentrations of a small number of mRNAs in the rat ventral prostate increase after castration and then decrease upon androgen treatment. Since the repression of specific gene expression may be important in the regulation of organ growth, we have cloned a cDNA for an androgen-repressed mRNA, the concentration of which increased 17-fold 4 days after castration, and this increase was reversed rapidly by androgen treatment. By sequence analysis the androgen-repressed mRNA was identified as that coding for sulphated glycoprotein 2. Images Fig. 1. PMID:2920020

  10. Cloning and characterization of a cDNA coding 3-hydroxy-3-methylglutary CoA reductase involved in glycyrrhizic acid biosynthesis in Glycyrrhiza uralensis.

    PubMed

    Liu, Ying; Xu, Qiao-Xian; Xi, Pei-Yu; Chen, Hong-Hao; Liu, Chun-Sheng

    2013-05-01

    The roots of Glycyrrhiza uralensis are widely used in Chinese medicine for their action of clearing heat, detoxicating, relieving cough, dispelling sputum and tonifying spleen and stomach. The reason why Glycyrrhiza uralensis has potent and significant actions is that it contains various active secondary metabolites, especially glycyrrhizic acid. In the present study, we cloned the cDNA coding 3-hydroxy-3-methylglutary CoA reductase (HMGR) involved in glycyrrhizic acid biosynthesis in Glycyrrhiza uralensis. The corresponding cDNA was expressed in Escherichia coli as fusion proteins. Recombinant HMGR exhibited catalysis activity in reduction of HMG-CoA to mevalonic acid (MVA) just as HMGR isolated from other species. Because HMGR gene is very important in the biosynthesis of glycyrrhizic acid in Glycyrrhiza uralensis, this work is significant for further studies concerned with strengthening the efficacy of Glycyrrhiza uralensis by means of increasing glycyrrhizic acid content and exploring the biosynthesis of glycyrrhizic acid in vitro.

  11. Quantum decision tree classifier

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Braunstein, Samuel L.

    2013-11-01

    We study the quantum version of a decision tree classifier to fill the gap between quantum computation and machine learning. The quantum entropy impurity criterion which is used to determine which node should be split is presented in the paper. By using the quantum fidelity measure between two quantum states, we cluster the training data into subclasses so that the quantum decision tree can manipulate quantum states. We also propose algorithms constructing the quantum decision tree and searching for a target class over the tree for a new quantum object.

  12. DNMT3B interacts with constitutive centromere protein CENP-C to modulate DNA methylation and the histone code at centromeric regions.

    PubMed

    Gopalakrishnan, Suhasni; Sullivan, Beth A; Trazzi, Stefania; Della Valle, Giuliano; Robertson, Keith D

    2009-09-01

    DNA methylation is an epigenetically imposed mark of transcriptional repression that is essential for maintenance of chromatin structure and genomic stability. Genome-wide methylation patterns are mediated by the combined action of three DNA methyltransferases: DNMT1, DNMT3A and DNMT3B. Compelling links exist between DNMT3B and chromosome stability as emphasized by the mitotic defects that are a hallmark of ICF syndrome, a disease arising from germline mutations in DNMT3B. Centromeric and pericentromeric regions are essential for chromosome condensation and the fidelity of segregation. Centromere regions contain distinct epigenetic marks, including dense DNA hypermethylation, yet the mechanisms by which DNA methylation is targeted to these regions remains largely unknown. In the present study, we used a yeast two-hybrid screen and identified a novel interaction between DNMT3B and constitutive centromere protein CENP-C. CENP-C is itself essential for mitosis. We confirm this interaction in mammalian cells and map the domains responsible. Using siRNA knock downs, bisulfite genomic sequencing and ChIP, we demonstrate for the first time that CENP-C recruits DNA methylation and DNMT3B to both centromeric and pericentromeric satellite repeats and that CENP-C and DNMT3B regulate the histone code in these regions, including marks characteristic of centromeric chromatin. Finally, we demonstrate that loss of CENP-C or DNMT3B leads to elevated chromosome misalignment and segregation defects during mitosis and increased transcription of centromeric repeats. Taken together, our data reveal a novel mechanism by which DNA methylation is targeted to discrete regions of the genome and contributes to chromosomal stability.

  13. Cloning and Stable Expression of cDNA Coding For Platelet Endothelial Cell Adhesion Molecule -1 (PECAM-1, CD31) in NIH-3T3 Cell Line.

    PubMed

    Salehi-Lalemarzi, Hamed; Shanehbandi, Dariush; Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Baradaran, Behzad; Movassaghpour, Ali Akbar; Kazemi, Tohid

    2015-06-01

    PECAM-1 (CD31) is a glycoprotein expressed on endothelial and bone marrow precursor cells. It plays important roles in angiogenesis, maintenance and integration of the cytoskeleton and direction of leukocytes to the site of inflammation. We aimed to clone the cDNA coding for human CD31 from KG1a for further subcloning and expression in NIH-3T3 mouse cell line. CD31 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. 2235 bp specific band was aligned completely to human CD31 reference sequence in NCBI database. Transient and stable expression of human CD31 on transfected NIH-3T3 mouse fibroblast cells was achieved (23% and 96%, respectively) as shown by flow cytometry. Due to murine origin of NIH-3T3 cell line, CD31-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD31, with no need for purification of recombinant proteins.

  14. Cloning and expression of a cDNA coding for the human platelet-derived growth factor receptor: Evidence for more than one receptor class

    SciTech Connect

    Gronwald, R.G.K.; Grant, F.J.; Haldeman, B.A.; Hart, C.E.; O'Hara, P.J.; Hagen, F.S.; Ross, R.; Bowen-Pope, D.F.; Murray, M.J. )

    1988-05-01

    The complete nucleotide sequence of a cDNA encoding the human platelet-derived growth factor (PDGF) receptor is presented. The cDNA contains an open reading frame that codes for a protein of 1106 amino acids. Comparison to the mouse PDGF receptor reveals an overall amino acid sequence identity of 86%. This sequence identity rises to 98% in the cytoplasmic split tyrosine kinase domain. RNA blot hybridization analysis of poly(A){sup +} RNA from human dermal fibroblasts detects a major and a minor transcript using the cDNA as a probe. Baby hamster kidney cells, transfected with an expression vector containing the receptor cDNA, express an {approx} 190-kDa cell surface protein that is recognized by an anti-human PDGF receptor antibody. The recombinant PDGF receptor is functional in the transfected baby hamster kidney cells as demonstrated by ligand-induced phosphorylation of the receptor. Binding properties of the recombinant PDGF receptor were also assessed with pure preparations of BB and AB isoforms of PDGF. Unlike human dermal fibroblasts, which bind both isoforms with high affinity, the transfected baby hamster kidney cells bind only the BB isoform of PDGF with high affinity. This observation is consistent with the existence of more than one PDGF receptor class.

  15. Progressive multifocal leukoencephalopathy. Diagnosis by in situ hybridization with a biotinylated JC virus DNA probe using an automated Histomatic Code-On slide stainer.

    PubMed

    Hulette, C M; Downey, B T; Burger, P C

    1991-08-01

    The accurate surgical pathological diagnosis of progressive multifocal leukoencephalopathy (PML) depends on the demonstration of pathognomonic histological features in cerebral biopsy tissue. The diagnosis may be difficult, however, if only small tissue fragments are submitted from the center of a demyelinating lesion. Previous studies by other authors have established that in situ hybridization with a biotinylated JC virus DNA probe can be a valuable diagnostic adjunct because it identifies the virally infected cells with great specificity and does not depend on the larger specimen, which may be necessary for a firm histological diagnosis. To confirm and extend these findings, we have used a commercially available biotinylated JC virus DNA probe to demonstrate the presence of viral DNA in formalin-fixed, paraffin-embedded tissues from four open biopsies, four needle biopsies, and two autopsies of patients with PML. With the goal of making this procedure applicable to the general surgical pathology laboratory, this method was adapted to the Histomatic Code-On slide stainer. The Histomatic is a programmable, robotic instrument with walk-away capability for hybridization histochemistry. Operation of this instrument requires the same expertise as execution of immunocytochemistry. With the advent of commercially available JC virus DNA probes and an automated system for hybridization histochemistry, this technology for diagnosis of PML may enter the routine diagnostic surgical pathology laboratory.

  16. High Performance Medical Classifiers

    NASA Astrophysics Data System (ADS)

    Fountoukis, S. G.; Bekakos, M. P.

    2009-08-01

    In this paper, parallelism methodologies for the mapping of machine learning algorithms derived rules on both software and hardware are investigated. Feeding the input of these algorithms with patient diseases data, medical diagnostic decision trees and their corresponding rules are outputted. These rules can be mapped on multithreaded object oriented programs and hardware chips. The programs can simulate the working of the chips and can exhibit the inherent parallelism of the chips design. The circuit of a chip can consist of many blocks, which are operating concurrently for various parts of the whole circuit. Threads and inter-thread communication can be used to simulate the blocks of the chips and the combination of block output signals. The chips and the corresponding parallel programs constitute medical classifiers, which can classify new patient instances. Measures taken from the patients can be fed both into chips and parallel programs and can be recognized according to the classification rules incorporated in the chips and the programs design. The chips and the programs constitute medical decision support systems and can be incorporated into portable micro devices, assisting physicians in their everyday diagnostic practice.

  17. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  18. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  19. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) - Definition of a Distinct Class of Begomovirus-Associated Satellites.

    PubMed

    Lozano, Gloria; Trenado, Helena P; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem-loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem-loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed.

  20. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

    PubMed Central

    Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  1. Isolation of cDNA clones coding for the alpha and beta chains of human propionyl-CoA carboxylase: chromosomal assignments and DNA polymorphisms associated with PCCA and PCCB genes.

    PubMed Central

    Lamhonwah, A M; Barankiewicz, T J; Willard, H F; Mahuran, D J; Quan, F; Gravel, R A

    1986-01-01

    Propionyl-CoA carboxylase [PCC, propanoyl-CoA:carbon-dioxide ligase (ADP-forming), EC 6.4.1.3] is a biotin-dependent enzyme involved in the degradation of branched-chain amino acids, fatty acids with odd-numbered chain lengths, and other metabolites. Inherited deficiency of the enzyme results in propionic acidemia, an autosomal recessive disorder showing considerable clinical heterogeneity. To facilitate investigations of enzyme structure and the nature of mutation in propionic acidemia, we have isolated cDNA clones coding for the alpha and beta polypeptides of human PCC. Sequences of two peptides derived from human liver PCC were used to specify oligonucleotide probes that were then used to screen a human fibroblast cDNA library. Two classes of cDNA clones were thus identified. One class contained the anticipated Ala-Met-Lys-Met sequence, corresponding to the biotin binding site found in several biotin-dependent carboxylases, thus confirming the alpha-chain assignment of these clones. In addition, they contained the deduced amino acid sequence of two of the sequenced peptides, including that of one of the oligonucleotide probes. The second class, coding for the beta polypeptide, contained the sequences of four peptides, including the sequence corresponding to the other oligonucleotide probe. Blot hybridization of RNA from normal human fibroblasts revealed a single mRNA species of 2.9 kilobases coding for the alpha polypeptide and two species of 4.5 and 2.0 kilobases detected for the beta polypeptide. By use of a panel of somatic mouse-human hybrids, the human gene encoding the alpha polypeptide (PCCA) was localized to chromosome 13, while the gene encoding the beta polypeptide (PCCB) was assigned to chromosome 3. Restriction fragment length polymorphisms were identified, at both PCCA and PCCB, that should prove useful to individual families at risk for propionic acidemia. Images PMID:3460076

  2. Stack filter classifiers

    SciTech Connect

    Porter, Reid B; Hush, Don

    2009-01-01

    Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

  3. Transionospheric chirp event classifier

    SciTech Connect

    Argo, P.E.; Fitzgerald, T.J.; Freeman, M.J.

    1995-09-01

    In this paper we will discuss a project designed to provide computer recognition of the transionospheric chirps/pulses measured by the Blackbeard (BB) satellite, and expected to be measured by the upcoming FORTE satellite. The Blackbeard data has been perused by human means -- this has been satisfactory for the relatively small amount of data taken by Blackbeard. But with the advent of the FORTE system, which by some accounts might ``see`` thousands of events per day, it is important to provide a software/hardware method of accurately analyzing the data. In fact, we are providing an onboard DSP system for FORTE, which will test the usefulness of our Event Classifier techniques in situ. At present we are constrained to work with data from the Blackbeard satellite, and will discuss the progress made to date.

  4. Transionospheric chirp event classifier

    NASA Astrophysics Data System (ADS)

    Argo, P. E.; Fitzgerald, T. J.; Freeman, M. J.

    In this paper we will discuss a project designed to provide computer recognition of the transionospheric chirps/pulses measured by the Blackbeard (BB) satellite, and expected to be measured by the upcoming FORTE satellite. The Blackbeard data has been perused by human means - this has been satisfactory for the relatively small amount of data taken by Blackbeard. But with the advent of the FORTE system, which by some accounts might 'see' thousands of events per day, it is important to provide a software/hardware method of accurately analyzing the data. In fact, we are providing an onboard DSP system for FORTE, which will test the usefulness of our Event Classifier techniques in situ. At present we are constrained to work with data from the Blackbeard satellite, and will discuss the progress made to date.

  5. Classifying TDSS Stellar Variables

    NASA Astrophysics Data System (ADS)

    Amaro, Rachael Christina; Green, Paul J.; TDSS Collaboration

    2017-01-01

    The Time Domain Spectroscopic Survey (TDSS), a subprogram of SDSS-IV eBOSS, obtains classification/discovery spectra of point-source photometric variables selected from PanSTARRS and SDSS multi-color light curves regardless of object color or lightcurve shape. Tens of thousands of TDSS spectra are already available and have been spectroscopically classified both via pipeline and by visual inspection. About half of these spectra are quasars, half are stars. Our goal is to classify the stars with their correct variability types. We do this by acquiring public multi-epoch light curves for brighter stars (r<19.5mag) from the Catalina Sky Survey (CSS). We then run a number of light curve analyses from VARTOOLS, a program for analyzing astronomical time-series data, to constrain variable type both for broad statistics relevant to future surveys like the Transiting Exoplanet Survey Satellite (TESS) and the Large Synoptic Survey Telescope (LSST), and to find the inevitable exotic oddballs that warrant further follow-up. Specifically, the Lomb-Scargle Periodogram and the Box-Least Squares Method are being implemented and tested against their known variable classifications and parameters in the Catalina Surveys Periodic Variable Star Catalog. Variable star classifications include RR Lyr, close eclipsing binaries, CVs, pulsating white dwarfs, and other exotic systems. The key difference between our catalog and others is that along with the light curves, we will be using TDSS spectra to help in the classification of variable type, as spectra are rich with information allowing estimation of physical parameters like temperature, metallicity, gravity, etc. This work was supported by the SDSS Research Experience for Undergraduates program, which is funded by a grant from Sloan Foundation to the Astrophysical Research Consortium.

  6. Molecular cloning and expression in Escherichia coli of the cDNA coding for rat lipocortin I (calpactin II).

    PubMed

    Shimizu, Y; Takabayashi, E; Yano, S Y; Shimizu, N; Yamada, K; Gushima, H

    1988-05-15

    Lipocortins (LC) are a family of proteins that were initially described to be induced by glucocorticosteroids and to inhibit phospholipase A2 (PLA2). Using oligodeoxynucleotide probes corresponding to partial amino acid (aa) sequences of rat lipocortin I (LCI), we have isolated a cDNA clone for rat LCI from a cDNA library prepared from poly(A)+RNA of peritoneal cells of dexamethasone-treated rat. The cDNA insert (1355 bp) had an open reading frame of 1038 bp that encoded a 346-aa polypeptide (Mr 38,784). The nucleotide sequence and the amino acid sequence deduced from it showed high homology with the reported sequences of human LCI. A plasmid containing the trc promoter and cDNA sequence for 346 aa residues of the rat LCI was constructed and expressed in Escherichia coli. Antibody to human LCI crossreacted with the recombinant rat LCI, and the recombinant protein had characteristics of natural rat LCI including PLA2 inhibitory activity in vitro.

  7. Comparative analyses of coding and noncoding DNA regions indicate that Acropora (Anthozoa: Scleractina) possesses a similar evolutionary tempo of nuclear vs. mitochondrial genomes as in plants.

    PubMed

    Chen, I-Ping; Tang, Chung-Yu; Chiou, Chih-Yung; Hsu, Jia-Ho; Wei, Nuwei Vivian; Wallace, Carden C; Muir, Paul; Wu, Henry; Chen, Chaolun Allen

    2009-01-01

    Evidence suggests that the mitochondrial (mt)DNA of anthozoans is evolving at a slower tempo than their nuclear DNA; however, parallel surveys of nuclear and mitochondrial variations and calibrated rates of both synonymous and nonsynonymous substitutions across taxa are needed in order to support this scenario. We examined species of the scleractinian coral genus Acropora, including previously unstudied species, for molecular variations in protein-coding genes and noncoding regions of both nuclear and mt genomes. DNA sequences of a calmodulin (CaM)-encoding gene region containing three exons, two introns and a 411-bp mt intergenic spacer (IGS) spanning the cytochrome b (cytb) and NADH 2 genes, were obtained from 49 Acropora species. The molecular evolutionary rates of coding and noncoding regions in nuclear and mt genomes were compared in conjunction with published data, including mt cytochrome b, the control region, and nuclear Pax-C introns. Direct sequencing of the mtIGS revealed an average interspecific variation comparable to that seen in published data for mt cytb. The average interspecific variation of the nuclear genome was two to five times greater than that of the mt genome. Based on the calibration of the closure of Panama Isthmus (3.0 mya) and closure of the Tethy Seaway (12 mya), synonymous substitution rates ranged from 0.367% to 1.467% Ma(-1) for nuclear CaM, which is about 4.8 times faster than those of mt cytb (0.076-0.303% Ma(-1)). This is similar to the findings in plant genomes that the nuclear genome is evolving at least five times faster than those of mitochondrial counterparts.

  8. The non-coding B2 RNA binds to the DNA cleft and active-site region of RNA polymerase II.

    PubMed

    Ponicsan, Steven L; Houel, Stephane; Old, William M; Ahn, Natalie G; Goodrich, James A; Kugel, Jennifer F

    2013-10-09

    The B2 family of short interspersed elements is transcribed into non-coding RNA by RNA polymerase III. The ~180-nt B2 RNA has been shown to potently repress mRNA transcription by binding tightly to RNA polymerase II (Pol II) and assembling with it into complexes on promoter DNA, where it keeps the polymerase from properly engaging the promoter DNA. Mammalian Pol II is an ~500-kDa complex that contains 12 different protein subunits, providing many possible surfaces for interaction with B2 RNA. We found that the carboxy-terminal domain of the largest Pol II subunit was not required for B2 RNA to bind Pol II and repress transcription in vitro. To identify the surface on Pol II to which the minimal functional region of B2 RNA binds, we coupled multi-step affinity purification, reversible formaldehyde cross-linking, peptide sequencing by mass spectrometry, and analysis of peptide enrichment. The Pol II peptides most highly recovered after cross-linking to B2 RNA mapped to the DNA binding cleft and active-site region of Pol II. These studies determine the location of a defined nucleic acid binding site on a large, native, multi-subunit complex and provide insight into the mechanism of transcriptional repression by B2 RNA. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. Replication of a pathogenic non-coding RNA increases DNA methylation in plants associated with a bromodomain-containing viroid-binding protein

    PubMed Central

    Lv, Dian-Qiu; Liu, Shang-Wu; Zhao, Jian-Hua; Zhou, Bang-Jun; Wang, Shao-Peng; Guo, Hui-Shan; Fang, Yuan-Yuan

    2016-01-01

    Viroids are plant-pathogenic molecules made up of single-stranded circular non-coding RNAs. How replicating viroids interfere with host silencing remains largely unknown. In this study, we investigated the effects of a nuclear-replicating Potato spindle tuber viroid (PSTVd) on interference with plant RNA silencing. Using transient induction of silencing in GFP transgenic Nicotiana benthamiana plants (line 16c), we found that PSTVd replication accelerated GFP silencing and increased Virp1 mRNA, which encodes bromodomain-containing viroid-binding protein 1 and is required for PSTVd replication. DNA methylation was increased in the GFP transgene promoter of PSTVd-replicating plants, indicating involvement of transcriptional gene silencing. Consistently, accelerated GFP silencing and increased DNA methylation in the of GFP transgene promoter were detected in plants transiently expressing Virp1. Virp1 mRNA was also increased upon PSTVd infection in natural host potato plants. Reduced transcript levels of certain endogenous genes were also consistent with increases in DNA methylation in related gene promoters in PSTVd-infected potato plants. Together, our data demonstrate that PSTVd replication interferes with the nuclear silencing pathway in that host plant, and this is at least partially attributable to Virp1. This study provides new insights into the plant-viroid interaction on viroid pathogenicity by subverting the plant cell silencing machinery. PMID:27767195

  10. Molecular cloning and expression of the cDNA coding for a new member of the S100 protein family from porcine cardiac muscle.

    PubMed

    Ohta, H; Sasaki, T; Naka, M; Hiraoka, O; Miyamoto, C; Furuichi, Y; Tanaka, T

    1991-12-16

    We isolated a new calcium-binding protein from porcine cardiac muscle by calcium-dependent hydrophobic and dye-affinity chromatography. It showed an apparent molecular weight of 11,000 on SDS-PAGE. Amino acid sequence determination revealed that the protein contained two calcium-binding domains of the EF-hand motif. The cDNA gene coding for this protein was cloned from the porcine lung cDNA library. Sequence analysis of the cloned cDNA showed that the protein was composed of 99 amino acid residues and its molecular weight was estimated to be 11,179. Immunological and functional characterization showed that the recombinant S100C protein expressed in Escherichia coli was identical to the natural protein. Homologies to calpactin light chain, S100 alpha and beta protein were 41.1%, 40.9% and 37.5%, respectively. The protein was expressed at high levels in lung and kidney, and low levels in liver and brain. The tissue distribution was apparently different from those of the other S100 protein family. These results indicate that this protein represents a new member of the S100 protein family, and thus we refer to it as S100C protein.

  11. Phylogeny of genetic codes and punctuation codes within genetic codes.

    PubMed

    Seligmann, Hervé

    2015-03-01

    Punctuation codons (starts, stops) delimit genes, reflect translation apparatus properties. Most codon reassignments involve punctuation. Here two complementary approaches classify natural genetic codes: (A) properties of amino acids assigned to codons (classical phylogeny), coding stops as X (A1, antitermination/suppressor tRNAs insert unknown residues), or as gaps (A2, no translation, classical stop); and (B) considering only punctuation status (start, stop and other codons coded as -1, 0 and 1 (B1); 0, -1 and 1 (B2, reflects ribosomal translational dynamics); and 1, -1, and 0 (B3, starts/stops as opposites)). All methods separate most mitochondrial codes from most nuclear codes; Gracilibacteria consistently cluster with metazoan mitochondria; mitochondria co-hosted with chloroplasts cluster with nuclear codes. Method A1 clusters the euplotid nuclear code with metazoan mitochondria; A2 separates euplotids from mitochondria. Firmicute bacteria Mycoplasma/Spiroplasma and Protozoan (and lower metazoan) mitochondria share codon-amino acid assignments. A1 clusters them with mitochondria, they cluster with the standard genetic code under A2: constraints on amino acid ambiguity versus punctuation-signaling produced the mitochondrial versus bacterial versions of this genetic code. Punctuation analysis B2 converges best with classical phylogenetic analyses, stressing the need for a unified theory of genetic code punctuation accounting for ribosomal constraints.

  12. Classifying partner femicide.

    PubMed

    Dixon, Louise; Hamilton-Giachritsis, Catherine; Browne, Kevin

    2008-01-01

    The heterogeneity of domestic violent men has long been established. However, research has failed to examine this phenomenon among men committing the most severe form of domestic violence. This study aims to use a multidimensional approach to empirically construct a classification system of men who are incarcerated for the murder of their female partner based on the Holtzworth-Munroe and Stuart (1994) typology. Ninety men who had been convicted and imprisoned for the murder of their female partner or spouse in England were identified from two prison samples. A content dictionary defining offense and offender characteristics associated with two dimensions of psychopathology and criminality was developed. These variables were extracted from institutional records via content analysis and analyzed for thematic structure using multidimensional scaling procedures. The resultant framework classified 80% (n = 72) of the sample into three subgroups of men characterized by (a) low criminality/low psychopathology (15%), (b) moderate-high criminality/ high psychopathology (36%), and (c) high criminality/low-moderate psychopathology (49%). The latter two groups are akin to Holtzworth-Munroe and Stuart's (1994) generally violent/antisocial and dysphoric/borderline offender, respectively. The implications for intervention, developing consensus in research methodology across the field, and examining typologies of domestic violent men prospectively are discussed.

  13. A Framework for Identifying and Classifying Undergraduate Student Proof Errors

    ERIC Educational Resources Information Center

    Strickland, S.; Rand, B.

    2016-01-01

    This paper describes a framework for identifying, classifying, and coding student proofs, modified from existing proof-grading rubrics. The framework includes 20 common errors, as well as categories for interpreting the severity of the error. The coding scheme is intended for use in a classroom context, for providing effective student feedback. In…

  14. A Framework for Identifying and Classifying Undergraduate Student Proof Errors

    ERIC Educational Resources Information Center

    Strickland, S.; Rand, B.

    2016-01-01

    This paper describes a framework for identifying, classifying, and coding student proofs, modified from existing proof-grading rubrics. The framework includes 20 common errors, as well as categories for interpreting the severity of the error. The coding scheme is intended for use in a classroom context, for providing effective student feedback. In…

  15. The Stat3/GR interaction code: predictive value of direct/indirect DNA recruitment for transcription outcome.

    PubMed

    Langlais, David; Couture, Catherine; Balsalobre, Aurélio; Drouin, Jacques

    2012-07-13

    Transcription factor recruitment to genomic sites of action is primarily due to direct protein:DNA interactions. The subsequent recruitment of coregulatory complexes leads to either transcriptional activation or repression. In contrast to this canonical scheme, some transcription factors, such as the glucocorticoid receptor (GR), behave as transcriptional repressors when recruited to target genes through protein tethering. We have investigated the genome-wide prevalence of tethering between GR and Stat3 and found nonreciprocal interactions, namely that GR tethering to DNA-bound Stat3 results in transcriptional repression, whereas Stat3 tethering to GR results in synergism. Further, other schemes of GR and Stat3 corecruitment to regulatory modules result in transcriptional synergism, including neighboring and composite binding sites. The results indicate extensive transcriptional interactions between Stat3 and GR; further, they provide a genome-wide assessment of transcriptional regulation by tethering and a molecular basis for integration of signals mediated by GR and Stats in health and disease.

  16. Cloning and expression of a cDNA coding for the anticoagulant hirudin from the bloodsucking leech, Hirudo medicinalis.

    PubMed Central

    Harvey, R P; Degryse, E; Stefani, L; Schamber, F; Cazenave, J P; Courtney, M; Tolstoshev, P; Lecocq, J P

    1986-01-01

    Cloned cDNAs have been isolated that encode a variant of hirudin, a potent thrombin inhibitor that is secreted by the salivary glands of the medicinal leech, Hirudo medicinalis. This variant probably corresponds to a form that has been purified from leech heads but differs in amino acid sequence from the hirudin purified from whole leeches. There are at least three hirudin transcripts detectable in leech RNAs that are different in size, site of synthesis, inducibility by starvation, and relationship to hirudin activity. The new hirudin variant predicted by the cDNA and the heterodisperse transcription products suggest a hirudin protein family. The hirudin cDNA was expressed in Escherichia coli under the control of the bacteriophage lambda PL promoter. The recombinant product is biologically active, inhibiting the cleavage by thrombin of fibrinogen and a synthetic tripeptide substrate. Images PMID:3513162

  17. Effective Protective Immunity to Yersinia pestis Infection Conferred by DNA Vaccine Coding for Derivatives of the F1 Capsular Antigen

    PubMed Central

    Grosfeld, Haim; Cohen, Sara; Bino, Tamar; Flashner, Yehuda; Ber, Raphael; Mamroud, Emanuelle; Kronman, Chanoch; Shafferman, Avigdor; Velan, Baruch

    2003-01-01

    Three plasmids expressing derivatives of the Yersinia pestis capsular F1 antigen were evaluated for their potential as DNA vaccines. These included plasmids expressing the full-length F1, F1 devoid of its putative signal peptide (deF1), and F1 fused to the signal-bearing E3 polypeptide of Semliki Forest virus (E3/F1). Expression of these derivatives in transfected HEK293 cells revealed that deF1 is expressed in the cytosol, E3/F1 is targeted to the secretory cisternae, and the nonmodified F1 is rapidly eliminated from the cell. Intramuscular vaccination of mice with these plasmids revealed that the vector expressing deF1 was the most effective in eliciting anti-F1 antibodies. This response was not limited to specific mouse strains or to the mode of DNA administration, though gene gun-mediated vaccination was by far more effective than intramuscular needle injection. Vaccination of mice with deF1 DNA conferred protection against subcutaneous infection with the virulent Y. pestis Kimberley53 strain, even at challenge amounts as high as 4,000 50% lethal doses. Antibodies appear to play a major role in mediating this protection, as demonstrated by passive transfer of anti-deF1 DNA antiserum. Taken together, these observations indicate that a tailored genetic vaccine based on a bacterial protein can be used to confer protection against plague in mice without resorting to regimens involving the use of purified proteins. PMID:12496187

  18. Structural basis for the dual coding potential of 8-oxoguanosine by a high-fidelity DNA polymerase

    PubMed Central

    Brieba, Luis G; Eichman, Brandt F; Kokoska, Robert J; Doublié, Sylvie; Kunkel, Tom A; Ellenberger, Tom

    2004-01-01

    Accurate DNA replication involves polymerases with high nucleotide selectivity and proofreading activity. We show here why both fidelity mechanisms fail when normally accurate T7 DNA polymerase bypasses the common oxidative lesion 8-oxo-7, 8-dihydro-2′-deoxyguanosine (8oG). The crystal structure of the polymerase with 8oG templating dC insertion shows that the O8 oxygen is tolerated by strong kinking of the DNA template. A model of a corresponding structure with dATP predicts steric and electrostatic clashes that would reduce but not eliminate insertion of dA. The structure of a postinsertional complex shows 8oG(syn)·dA (anti) in a Hoogsteen-like base pair at the 3′ terminus, and polymerase interactions with the minor groove surface of the mismatch that mimic those with undamaged, matched base pairs. This explains why translesion synthesis is permitted without proofreading of an 8oG·dA mismatch, thus providing insight into the high mutagenic potential of 8oG. PMID:15297882

  19. Interspecific comparison of the period gene of Drosophila reveals large blocks of non-conserved coding DNA.

    PubMed Central

    Colot, H V; Hall, J C; Rosbash, M

    1988-01-01

    We have cloned and sequenced the coding region of the period (per) gene from Drosophila pseudoobscura and D. virilis. A comparison with that of D. melanogaster reveals that the conceptual translation products consist of interspersed blocks of conserved and non-conserved amino acid sequence. The non-conserved portion, comprising approximately 33% of the protein sequence, includes the perfect Thr-Gly repeat of D. melanogaster, which is absent from the D. pseudoobscura and D. virilis proteins. Based on these observations and cross-species transformation experiments, we suggest that the interspecific variability in the per primary amino acid sequence contributes to the control of species-specific behaviors. PMID:3208754

  20. Sequence of a novel cytochrome CYP2B cDNA coding for a protein which is expressed in a sebaceous gland, but not in the liver.

    PubMed Central

    Friedberg, T; Grassow, M A; Bartlomowicz-Oesch, B; Siegert, P; Arand, M; Adesnik, M; Oesch, F

    1992-01-01

    The major phenobarbital-inducible rat hepatic cytochromes P-450, CYP2B1 and CYP2B2, are the paradigmatic members of a cytochrome P-450 gene subfamily that contains at least seven additional members. Specific oligonucleotide probes for these genomic members of the CYP2B subfamily were used to assess their tissue-specific expression. In Northern-blot analysis a probe specific to gene 4 (which is designated now as CYP2B12) hybridized to a single mRNA present in the preputial gland, an organ which is used as a model for sebaceous glands, but did not hybridize to mRNA isolated from the liver or from five other tissues of untreated or Aroclor 1254-treated rats. The cDNA sequence for the CYP2B12 RNA was determined from overlapping cDNA clones and contained a long open reading frame of 1476 bp. The nucleotide sequence of the CYP2B12 cDNA was 85% similar to the sequence of the CYP2B1 cDNA in its coding region and was different from any CYP2B cDNA characterized until now. The cDNA-derived primary structure of the CYP2B12 protein contains a signal sequence for its insertion into the endoplasmic reticulum and the putative haem-binding site characteristic of cytochromes P-450. A part of the potential haem pocket of CYP2B12 was identical with a similar structure in a bacterial protocatechuate dioxygenase. In immunoblot analysis of preputial-gland microsomes, antibodies against CYP2B1 recognized a single abundant protein with a lower apparent molecular mass than that of CYP2B1. Our results demonstrate that the CYP2B12 protein has the potential to be enzymically active and are the first demonstration that a member of the CYP2B subfamily is expressed exclusively and at high levels in an extrahepatic organ. Images Fig. 1. Fig. 5. Fig. 6. PMID:1445240

  1. Rare Failures of DNA Bar Codes to Separate Morphologically Distinct Species in a Biodiversity Survey of Iberian Leaf Beetles

    PubMed Central

    Baselga, Andrés; Gómez-Rodríguez, Carola; Novoa, Francisco; Vogler, Alfried P.

    2013-01-01

    During a survey of genetic and species diversity patterns of leaf beetle (Coleoptera: Chrysomelidae) assemblages across the Iberian Peninsula we found a broad congruence between morphologically delimited species and variation in the cytochrome oxidase (cox1) gene. However, one species pair each in the genera Longitarsus Berthold and Pachybrachis Chevrolat was inseparable using molecular methods, whereas diagnostic morphological characters (including male or female genitalia) unequivocally separated the named species. Parsimony haplotype networks and maximum likelihood trees built from cox1 showed high genetic structure within each species pair, but no correlation with the morphological types and neither with geographic distributions. This contrasted with all analysed congeneric species, which were recovered as monophyletic. A limited number of specimens were sequenced for the nuclear 18S rRNA gene, which showed no or very limited variation within the species pair and no separation of morphological types. These results suggest that processes of lineage sorting for either group are lagging behind the clear morphological and presumably reproductive separation. In the Iberian chrysomelids, incongruence between DNA-based and morphological delimitations is a rare exception, but the discovery of these species pairs may be useful as an evolutionary model for studying the process of speciation in this ecological and geographical setting. In addition, the study of biodiversity patterns based on DNA requires an evolutionary understanding of these incongruences and their potential causes. PMID:24040352

  2. Application of DNA Bar Codes for Screening of Industrially Important Fungi: the Haplotype of Trichoderma harzianum Sensu Stricto Indicates Superior Chitinase Formation▿

    PubMed Central

    Nagy, Viviana; Seidl, Verena; Szakacs, George; Komoń-Zelazowska, Monika; Kubicek, Christian P.; Druzhinina, Irina S.

    2007-01-01

    Selection of suitable strains for biotechnological purposes is frequently a random process supported by high-throughput methods. Using chitinase production by Hypocrea lixii/Trichoderma harzianum as a model, we tested whether fungal strains with superior enzyme formation may be diagnosed by DNA bar codes. We analyzed sequences of two phylogenetic marker loci, internal transcribed spacer 1 (ITS1) and ITS2 of the rRNA-encoding gene cluster and the large intron of the elongation factor 1-alpha gene, tef1, from 50 isolates of H. lixii/T. harzianum, which were also tested to determine their ability to produce chitinases in solid-state fermentation (SSF). Statistically supported superior chitinase production was obtained for strains carrying one of the observed ITS1 and ITS2 and tef1 alleles corresponding to an allele of T. harzianum type strain CBS 226.95. A tef1-based DNA bar code tool, TrichoCHIT, for rapid identification of these strains was developed. The geographic origin of the strains was irrelevant for chitinase production. The improved chitinase production by strains containing this haplotype was not due to better growth on N-acetyl-β-d-glucosamine or glucosamine. Isoenzyme electrophoresis showed that neither the isoenzyme profile of N-acetyl-β-glucosaminidases or the endochitinases nor the intensity of staining of individual chitinase bands correlated with total chitinase in the culture filtrate. The superior chitinase producers did not exhibit similarly increased cellulase formation. Biolog Phenotype MicroArray analysis identified lack of N-acetyl-β-d-mannosamine utilization as a specific trait of strains with the chitinase-overproducing haplotype. This observation was used to develop a plate screening assay for rapid microbiological identification of the strains. The data illustrate that desired industrial properties may be an attribute of certain populations within a species, and screening procedures should thus include a balanced mixture of all

  3. Application of DNA bar codes for screening of industrially important fungi: the haplotype of Trichoderma harzianum sensu stricto indicates superior chitinase formation.

    PubMed

    Nagy, Viviana; Seidl, Verena; Szakacs, George; Komoń-Zelazowska, Monika; Kubicek, Christian P; Druzhinina, Irina S

    2007-11-01

    Selection of suitable strains for biotechnological purposes is frequently a random process supported by high-throughput methods. Using chitinase production by Hypocrea lixii/Trichoderma harzianum as a model, we tested whether fungal strains with superior enzyme formation may be diagnosed by DNA bar codes. We analyzed sequences of two phylogenetic marker loci, internal transcribed spacer 1 (ITS1) and ITS2 of the rRNA-encoding gene cluster and the large intron of the elongation factor 1-alpha gene, tef1, from 50 isolates of H. lixii/T. harzianum, which were also tested to determine their ability to produce chitinases in solid-state fermentation (SSF). Statistically supported superior chitinase production was obtained for strains carrying one of the observed ITS1 and ITS2 and tef1 alleles corresponding to an allele of T. harzianum type strain CBS 226.95. A tef1-based DNA bar code tool, TrichoCHIT, for rapid identification of these strains was developed. The geographic origin of the strains was irrelevant for chitinase production. The improved chitinase production by strains containing this haplotype was not due to better growth on N-acetyl-beta-D-glucosamine or glucosamine. Isoenzyme electrophoresis showed that neither the isoenzyme profile of N-acetyl-beta-glucosaminidases or the endochitinases nor the intensity of staining of individual chitinase bands correlated with total chitinase in the culture filtrate. The superior chitinase producers did not exhibit similarly increased cellulase formation. Biolog Phenotype MicroArray analysis identified lack of N-acetyl-beta-D-mannosamine utilization as a specific trait of strains with the chitinase-overproducing haplotype. This observation was used to develop a plate screening assay for rapid microbiological identification of the strains. The data illustrate that desired industrial properties may be an attribute of certain populations within a species, and screening procedures should thus include a balanced mixture of all

  4. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence.

    PubMed

    Neme, Rafik; Tautz, Diethard

    2016-02-02

    Deep sequencing analyses have shown that a large fraction of genomes is transcribed, but the significance of this transcription is much debated. Here, we characterize the phylogenetic turnover of poly-adenylated transcripts in a comprehensive sampling of taxa of the mouse (genus Mus), spanning a phylogenetic distance of 10 Myr. Using deep RNA sequencing we find that at a given sequencing depth transcriptome coverage becomes saturated within a taxon, but keeps extending when compared between taxa, even at this very shallow phylogenetic level. Our data show a high turnover of transcriptional states between taxa and that no major transcript-free islands exist across evolutionary time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. We conclude that any part of the non-coding genome can potentially become subject to evolutionary functionalization via de novo gene evolution within relatively short evolutionary time spans.

  5. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  6. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  7. Isolation and functional characterization of a cDNA coding a hydroxycinnamoyltransferase involved in phenylpropanoid biosynthesis in Cynara cardunculus L

    PubMed Central

    Comino, Cinzia; Lanteri, Sergio; Portis, Ezio; Acquadro, Alberto; Romani, Annalisa; Hehn, Alain; Larbat, Romain; Bourgaud, Frédéric

    2007-01-01

    Background Cynara cardunculus L. is an edible plant of pharmaceutical interest, in particular with respect to the polyphenolic content of its leaves. It includes three taxa: globe artichoke, cultivated cardoon, and wild cardoon. The dominating phenolics are the di-caffeoylquinic acids (such as cynarin), which are largely restricted to Cynara species, along with their precursor, chlorogenic acid (CGA). The scope of this study is to better understand CGA synthesis in this plant. Results A gene sequence encoding a hydroxycinnamoyltransferase (HCT) involved in the synthesis of CGA, was identified. Isolation of the gene sequence was achieved by using a PCR strategy with degenerated primers targeted to conserved regions of orthologous HCT sequences available. We have isolated a 717 bp cDNA which shares 84% aminoacid identity and 92% similarity with a tobacco gene responsible for the biosynthesis of CGA from p-coumaroyl-CoA and quinic acid. In silico studies revealed the globe artichoke HCT sequence clustering with one of the main acyltransferase groups (i.e. anthranilate N-hydroxycinnamoyl/benzoyltransferase). Heterologous expression of the full length HCT (GenBank accession DQ104740) cDNA in E. coli demonstrated that the recombinant enzyme efficiently synthesizes both chlorogenic acid and p-coumaroyl quinate from quinic acid and caffeoyl-CoA or p-coumaroyl-CoA, respectively, confirming its identity as a hydroxycinnamoyl-CoA: quinate HCT. Variable levels of HCT expression were shown among wild and cultivated forms of C. cardunculus subspecies. The level of expression was correlated with CGA content. Conclusion The data support the predicted involvement of the Cynara cardunculus HCT in the biosynthesis of CGA before and/or after the hydroxylation step of hydroxycinnamoyl esters. PMID:17374149

  8. Structure and expression of the gene coding for the alpha-subunit of DNA-dependent RNA polymerase from the chloroplast genome of Zea mays.

    PubMed Central

    Ruf, M; Kössel, H

    1988-01-01

    The rpoA gene coding for the alpha-subunit of DNA-dependent RNA polymerase located on the DNA of Zea mays chloroplasts has been characterized with respect to its position on the chloroplast genome and its nucleotide sequence. The amino acid sequence derived for a 39 Kd polypeptide shows strong homology with sequences derived from the rpoA genes of other chloroplast species and with the amino acid sequence of the alpha-subunit from E. coli RNA polymerase. Transcripts of the rpoA gene were identified by Northern hybridization and characterized by S1 mapping using total RNA isolated from maize chloroplasts. Antibodies raised against a synthetic C-terminal heptapeptide show cross reactivity with a 39 Kd polypeptide contained in the stroma fraction of maize chloroplasts. It is concluded that the rpoA gene is a functional gene and that therefore, at least the alpha-subunit of plastidic RNA polymerase, is expressed in chloroplasts. Images PMID:3399379

  9. Analysis of Argonaute 4-Associated Long Non-Coding RNA in Arabidopsis thaliana Sheds Novel Insights into Gene Regulation through RNA-Directed DNA Methylation.

    PubMed

    Au, Phil Chi Khang; Dennis, Elizabeth S; Wang, Ming-Bo

    2017-08-07

    RNA-directed DNA methylation (RdDM) is a plant-specific de novo DNA methylation mechanism that requires long noncoding RNA (lncRNA) as scaffold to define target genomic loci. While the role of RdDM in maintaining genome stability is well established, how it regulates protein-coding genes remains poorly understood and few RdDM target genes have been identified. In this study, we obtained sequences of RdDM-associated lncRNAs using nuclear RNA immunoprecipitation against ARGONAUTE 4 (AGO4), a key component of RdDM that binds specifically with the lncRNA. Comparison of these lncRNAs with gene expression data of RdDM mutants identified novel RdDM target genes. Surprisingly, a large proportion of these target genes were repressed in RdDM mutants suggesting that they are normally activated by RdDM. These RdDM-activated genes are more enriched for gene body lncRNA than the RdDM-repressed genes. Histone modification and RNA analyses of several RdDM-activated stress response genes detected increased levels of active histone mark and short RNA transcript in the lncRNA-overlapping gene body regions in the ago4 mutant despite the repressed expression of these genes. These results suggest that RdDM, or AGO4, may play a role in maintaining or activating stress response gene expression by directing gene body chromatin modification preventing cryptic transcription.

  10. Analysis of Argonaute 4-Associated Long Non-Coding RNA in Arabidopsis thaliana Sheds Novel Insights into Gene Regulation through RNA-Directed DNA Methylation

    PubMed Central

    Au, Phil Chi Khang; Dennis, Elizabeth S.; Wang, Ming-Bo

    2017-01-01

    RNA-directed DNA methylation (RdDM) is a plant-specific de novo DNA methylation mechanism that requires long noncoding RNA (lncRNA) as scaffold to define target genomic loci. While the role of RdDM in maintaining genome stability is well established, how it regulates protein-coding genes remains poorly understood and few RdDM target genes have been identified. In this study, we obtained sequences of RdDM-associated lncRNAs using nuclear RNA immunoprecipitation against ARGONAUTE 4 (AGO4), a key component of RdDM that binds specifically with the lncRNA. Comparison of these lncRNAs with gene expression data of RdDM mutants identified novel RdDM target genes. Surprisingly, a large proportion of these target genes were repressed in RdDM mutants suggesting that they are normally activated by RdDM. These RdDM-activated genes are more enriched for gene body lncRNA than the RdDM-repressed genes. Histone modification and RNA analyses of several RdDM-activated stress response genes detected increased levels of active histone mark and short RNA transcript in the lncRNA-overlapping gene body regions in the ago4 mutant despite the repressed expression of these genes. These results suggest that RdDM, or AGO4, may play a role in maintaining or activating stress response gene expression by directing gene body chromatin modification preventing cryptic transcription. PMID:28783101

  11. Characterization of DNA polymerase β splicing variants in gastric cancer: the most frequent exon 2-deleted isoform is a non coding RNA

    PubMed Central

    Simonelli, Valeria; D’Errico, Mariarosaria; Palli, Domenico; Prasad, Rajendra; Wilson, Samuel H.; Dogliotti, Eugenia

    2009-01-01

    DNA repair polymerase β (Pol β) gene variants are frequently associated with tumor tissues. In this study a search for Pol β mutants and splice variants was conducted in matched normal and tumor gastric tissues and blood samples from healthy donors. No tumor associated mutations were found while a variety of alternative Pol β splicing variants were detected with high frequency in all the specimens analysed. Quantitative PCR of the Pol β variant lacking exon 2 (Ex2Δ) and the isoforms with exon 11 skipping allowed to clarify that these variants are not tumor- neither tissue-specific and their levels vary greatly among different individuals. The most frequent Ex2Δ variant was further characterized. We clearly demonstrated that this variant does not encode protein, as detected by both western blotting and immunofluorescence analysis of human AGS cells expressing HA tagged-Ex2Δ. The lack of translation was confirmed by comparing the DNA gap-filling capacity and alkylation sensitivity of wild type and Pol β null murine fibroblasts expressing the human Ex2Δ variant. We showed that the Ex2Δ transcript is polyadenylated and its half-life is significantly longer than that of the wild type mRNA as inferred by treating AGS cells with actinomycin D. Moreover, we found that it localizes to polyribosomes suggesting a role as post-transcriptional regulator. This study identifies a new type of DNA repair variants that do not give rise to functional proteins but to non coding RNAs that could either modulate target mRNAs or represent unproductive splicing events. PMID:19635489

  12. Accuracy/diversity and ensemble MLP classifier design.

    PubMed

    Windeatt, Terry

    2006-09-01

    The difficulties of tuning parameters of multilayer perceptrons (MLP) classifiers are well known. In this paper, a measure is described that is capable of predicting the number of classifier training epochs for achieving optimal performance in an ensemble of MLP classifiers. The measure is computed between pairs of patterns on the training data and is based on a spectral representation of a Boolean function. This representation characterizes the mapping from classifier decisions to target label and allows accuracy and diversity to be incorporated within a single measure. Results on many benchmark problems, including the Olivetti Research Laboratory (ORL) face database demonstrate that the measure is well correlated with base-classifier test error, and may be used to predict the optimal number of training epochs. While correlation with ensemble test error is not quite as strong, it is shown in this paper that the measure may be used to predict number of epochs for optimal ensemble performance. Although the technique is only applicable to two-class problems, it is extended here to multiclass through output coding. For the output-coding technique, a random code matrix is shown to give better performance than one-per-class code, even when the base classifier is well-tuned.

  13. The landscape of DNA methylation-mediated regulation of long non-coding RNAs in breast cancer

    PubMed Central

    Li, Xuecang; Zhao, Ning; Wang, Yihan; Han, Xiaole; Ci, Ce; Zhang, Jian; Li, Meng; Zhang, Yan

    2017-01-01

    Although systematic studies have identified a host of long non-coding RNAs (lncRNAs) which are involved in breast cancer, the knowledge about the methyla-tion-mediated dysregulation of those lncRNAs remains limited. Here, we integrated multi-omics data to analyze the methylated alteration of lncRNAs in breast invasive carcinoma (BRCA). We found that lncRNAs showed diverse methylation patterns on promoter regions in BRCA. LncRNAs were divided into two categories and four subcategories based on their promoter methylation patterns and expression levels be-tween tumor and normal samples. Through cis-regulatory analysis and gene ontology network, abnormally methylated lncRNAs were identified to be associated with can-cer regulation, proliferation or expression of transcription factors. Competing endog-enous RNA network and functional enrichment analysis of abnormally methylated lncRNAs showed that lncRNAs with different methylation patterns were involved in several hallmarks and KEGG pathways of cancers significantly. Finally, survival analysis based on mRNA modules in networks revealed that lncRNAs silenced by high methylation were associated with prognosis significantly in BRCA. This study enhances the understanding of aberrantly methylated patterns of lncRNAs and pro-vides a novel insight for identifying cancer biomarkers and potential therapeutic tar-gets in breast cancer. PMID:28881636

  14. Emergent behaviors of classifier systems

    SciTech Connect

    Forrest, S.; Miller, J.H.

    1989-01-01

    This paper discusses some examples of emergent behavior in classifier systems, describes some recently developed methods for studying them based on dynamical systems theory, and presents some initial results produced by the methodology. The goal of this work is to find techniques for noticing when interesting emergent behaviors of classifier systems emerge, to study how such behaviors might emerge over time, and make suggestions for designing classifier systems that exhibit preferred behaviors. 20 refs., 1 fig.

  15. Lichenase and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  16. Sequence analysis of coding DNA fragments of pfcrt and pfmdr-1 genes in Plasmodium falciparum isolates from Odisha, India.

    PubMed

    Sutar, Sasmita Kumari Das; Gupta, Bhavna; Ranjit, Manoranjan; Kar, Shantanu Kumar; Das, Aparup

    2011-02-01

    The global emergence and spread of malaria parasites resistant to antimalarial drugs is the major problem in malaria control. The genetic basis of the parasite's resistance to the antimalarial drug chloroquine (CQ) is well-documented, allowing for the analysis of field isolates of malaria parasites to address evolutionary questions concerning the origin and spread of CQ-resistance. Here, we present DNA sequence analyses of both the second exon of the Plasmodium falciparum CQ-resistance transporter (pfcrt) gene and the 5' end of the P. falciparum multidrug-resistance 1 (pfmdr-1) gene in 40 P. falciparum field isolates collected from eight different localities of Odisha, India. First, we genotyped the samples for the pfcrt K76T and pfmdr-1 N86Y mutations in these two genes, which are the mutations primarily implicated in CQ-resistance. We further analyzed amino acid changes in codons 72-76 of the pfcrt haplotypes. Interestingly, both the K76T and N86Y mutations were found to co-exist in 32 out of the total 40 isolates, which were of either the CVIET or SVMNT haplotype, while the remaining eight isolates were of the CVMNK haplotype. In total, eight nonsynonymous single nucleotide polymorphisms (SNPs) were observed, six in the pfcrt gene and two in the pfmdr-1 gene. One poorly studied SNP in the pfcrt gene (A97T) was found at a high frequency in many P. falciparum samples. Using population genetics to analyze these two gene fragments, we revealed comparatively higher nucleotide diversity in the pfcrt gene than in the pfmdr-1 gene. Furthermore, linkage disequilibrium was found to be tight between closely spaced SNPs of the pfcrt gene. Finally, both the pfcrt and the pfmdr-1 genes were found to evolve under the standard neutral model of molecular evolution.

  17. Feature Selection and Effective Classifiers.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  18. Feature Selection and Effective Classifiers.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  19. Testing the use of ITS rDNA and protein-coding genes in the generic and species delimitation of the lichen genus Usnea (Parmeliaceae, Ascomycota).

    PubMed

    Truong, Camille; Divakar, Pradeep K; Yahr, Rebecca; Crespo, Ana; Clerc, Philippe

    2013-08-01

    In lichen-forming fungi, traditional taxonomical concepts are frequently in conflict with molecular data, and identifying appropriate taxonomic characters to describe phylogenetic clades remains challenging in many groups. The selection of suitable markers for the reconstruction of solid phylogenetic hypotheses is therefore fundamental. The lichen genus Usnea is highly diverse, with more than 350 estimated species, distributed in polar, temperate and tropical regions. The phylogeny and classification of Usnea have been a matter of debate, given the lack of phenotypic characters to describe phylogenetic clades and the low degree of resolution of phylogenetic trees. In this study, we investigated the phylogenetic relationships of 52 Usnea species from across the genus, based on ITS rDNA, nuLSU, and two protein-coding genes RPB1 and MCM7. ITS comprised several highly variable regions, containing substantial genetic signal, but also susceptible to causing bias in the generation of the alignment. We compared several methods of alignment of ITS and found that a simultaneous optimization of alignment and phylogeny (using BAli-phy) improved significantly both the topology and the resolution of the phylogenetic tree. However the resolution was even better when using protein-coding genes, especially RPB1 although it is less variable. The phylogeny based on the concatenated dataset revealed that the genus Usnea is subdivided into four highly-supported clades, corresponding to the traditionally circumscribed subgenera Eumitria, Dolichousnea, Neuropogon and Usnea. However, characters that have been used to describe these clades are often homoplasious within the phylogeny and their parallel evolution is suggested. On the other hand, most of the species were reconstructed as monophyletic, indicating that combinations of phenotypic characters are suitable discriminators for delimitating species, but are inadequate to describe generic subdivisions.

  20. The effect of non-coding DNA variations on P53 and cMYC competitive inhibition at cis-overlapping motifs.

    PubMed

    Kin, Katherine; Chen, Xi; Gonzalez-Garay, Manuel; Fakhouri, Walid D

    2016-04-15

    Non-coding DNA variations play a critical role in increasing the risk for development of common complex diseases, and account for the majority of SNPs highly associated with cancer. However, it remains a challenge to identify etiologic variants and to predict their pathological effects on target gene expression for clinical purposes. Cis-overlapping motifs (COMs) are elements of enhancer regions that impact gene expression by enabling competitive binding and switching between transcription factors. Mutations within COMs are especially important when the involved transcription factors have opposing effects on gene regulation, like P53 tumor suppressor and cMYC proto-oncogene. In this study, genome-wide analysis of ChIP-seq data from human cancer and mouse embryonic cells identified a significant number of putative regulatory elements with signals for both P53 and cMYC. Each co-occupied element contains, on average, two COMs, and one common SNP every two COMs. Gene ontology of predicted target genes for COMs showed that the majority are involved in DNA damage, apoptosis, cell cycle regulation, and RNA processing. EMSA results showed that both cMYC and P53 bind to cis-overlapping motifs within a ChIP-seq co-occupied region in Chr12. In vitro functional analysis of selected co-occupied elements verified enhancer activity, and also showed that the occurrence of SNPs within three COMs significantly altered enhancer activity. We identified a list of COM-associated functional SNPs that are in close proximity to SNPs associated with common diseases in large population studies. These results suggest a potential molecular mechanism to identify etiologic regulatory mutations associated with common diseases.

  1. Cloning and characterization of a cDNA coding for Astacus embryonic astacin, a member of the astacin family of metalloproteases from the crayfish Astacus astacus.

    PubMed

    Geier, G; Zwilling, R

    1998-05-01

    The astacin family of zinc endopeptidases was named after the digestive enzyme astacin isolated from the crayfish Astacus astacus. Employing a reverse transcription/PCR strategy with degenerate oligonucleotide primers specific for two signature seqences of the astacin family, we have isolated a 1602-bp cDNA from embryos of developing A. astacus eggs, which was designated Astacus embryonic astacin (AEA). This cDNA was found to code for an astacin-like protease domain which accounts for the N-terminal half of the predicted protein. The C-terminal half mainly consists of two complement subcomponent C1r/C1s/embryonic sea urchin protein Uegf/bone morphogenetic protein 1 (CUB) domains. The metalloprotease domain displays an amino acid sequence identity of 42% with astacin. A higher sequence similarity was found to astacin family members that act as hatching enzymes in different species, e.g. chorioallantoic membrane protein 1 (CAM-1; from quail) and Xenopus hatching enzyme (formerly UVS.2), both of which show 54% identity, and high and low choriolytic enzymes (HCE and LCE) from the teleost Oryzias latipes (52% and 48% identity, respectively). A relationship to astacin-like hatching enzymes is further supported by a phylogenetic analysis of the protease domains. Expression of AEA mRNA in developing embryos was found to be restricted to unhatched juveniles (larvae) during the last 8 days before hatching. AEA transcripts could not be detected in various tissues of adult animals or in eggs and embryos from an earlier developmental stage. AEA expression starts about 8 days prior to hatching, followed by a strong (18-fold) induction with a maximum at day 4 before hatching. Newly hatched juveniles were found not to express the AEA mRNA.

  2. MScanner: a classifier for retrieving Medline citations

    PubMed Central

    Poulter, Graham L; Rubin, Daniel L; Altman, Russ B; Seoighe, Cathal

    2008-01-01

    retrieving topics for which many features may indicate relevance. Its web interface simplifies the task of classifying Medline citations, compared to building a pre-filter and classifier specific to the topic. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material, and the web interface may be accessed at . PMID:18284683

  3. Coding for surgical audit.

    PubMed

    Pettigrew, R A; van Rij, A M

    1990-05-01

    A simple system of codes for operations, diagnoses and complications, developed specifically for computerized surgical audit, is described. This arose following a review of our established surgical audit in which problems in the retrieval of data from the database were identified. Evaluation of current methods of classification of surgical data highlighted the need for a dedicated coding system that was suitable for classifying surgical audit data, enabling rapid retrieval from large databases. After 2 years of use, the coding system has been found to fulfil the criteria of being sufficiently flexible and specific for computerized surgical audit, yet simple enough for medical staff to use.

  4. Isolation and sequencing of cDNA clones coding for the catalytic unit of glucose-6-phosphatase from two haplochromine cichlid fishes.

    PubMed

    Nagl, S; Mayer, W E; Klein, J

    1999-01-01

    Complementary DNA clones coding for the catalytic unit of the enzyme glucose-6-phosphatase (G6Pase) were obtained from Haplochromis nubilus and Haplochromis xenognathus, two cichlid fish species from Lake Victoria. The translated sequence of these two cDNAs identifies a polypeptide consisting of 352 amino acid residues and showing a 54.4% similarity to the human form of G6Pase. The amino acid sequences of the two fish species are identical. The comparison of the fish amino acid sequence with the corresponding sequences of rat, mouse, and human G6Pase revealed that the amino acid residues, which are involved in G6Pase catalysis in humans, are also conserved in fish G6Pase. Northern blot analysis showed that G6Pase is expressed at the same level in 6- and 10-day-old fish. A three base pair insertion/deletion polymorphism was found in the 3'-untranslated region of the fish G6Pase gene. The polymorphism will be a useful marker in a phylogenetic study of Lake Victoria cichlids.

  5. Classifying Chondrules Based on Cathodoluminesence

    NASA Astrophysics Data System (ADS)

    Cristarela, T. C.; Sears, D. W.

    2011-03-01

    Sears et al. (1991) proposed a scheme to classify chondrules based on cathodoluminesence color and electron microprobe analysis. This research evaluates that scheme and criticisms received from Grossman and Brearley (2005).

  6. How Is Childhood Leukemia Classified?

    MedlinePlus

    ... Classification based on how the leukemia cells look (morphology) In the past, doctors used the French-American- ... of AML are classified mainly based on their morphology (how they look under the microscope). There are ...

  7. The Challenge of Classifying Polyhedra.

    ERIC Educational Resources Information Center

    Pedersen, Jean J.

    1980-01-01

    A question posed by Euler is considered: How can polyhedra be classified so that the results is in some way analogous to the simple classification of polygons according to the number of their sides? (MK)

  8. IAEA safeguards and classified materials

    SciTech Connect

    Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

    1997-11-01

    The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA`s safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials.

  9. Clinical coding. Code breakers.

    PubMed

    Mathieson, Steve

    2005-02-24

    --The advent of payment by results has seen the role of the clinical coder pushed to the fore in England. --Examinations for a clinical coding qualification began in 1999. In 2004, approximately 200 people took the qualification. --Trusts are attracting people to the role by offering training from scratch or through modern apprenticeships.

  10. A region of the polyoma virus genome between the replication origin and late protein coding sequences is required in cis for both early gene expression and viral DNA replication.

    PubMed Central

    Tyndall, C; La Mantia, G; Thacker, C M; Favaloro, J; Kamen, R

    1981-01-01

    Deletion mutants within the Py DNA region between the replication origin and the beginning of late protein coding sequences have been constructed and analysed for viability, early gene expression and viral DNA replication. Assay of replicative competence was facilitated by the use of Py transformed mouse cells (COP lines) which express functional large T-protein but contain no free viral DNA. Viable mutants defined three new nonessential regions of the genome. Certain deletions spanning the PvuII site at nt 5130 (67.4 mu) were unable to express early genes and had a cis-acting defect in DNA replication. Other mutants had intermediate phenotypes. Relevance of these results to eucaryotic "enhancer" elements is discussed. Images PMID:6275353

  11. DNA Dynamics.

    ERIC Educational Resources Information Center

    Warren, Michael D.

    1997-01-01

    Explains a method to enable students to understand DNA and protein synthesis using model-building and role-playing. Acquaints students with the triplet code and transcription. Includes copies of the charts used in this technique. (DDR)

  12. Building classifiers using Bayesian networks

    SciTech Connect

    Friedman, N.; Goldszmidt, M.

    1996-12-31

    Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state of the art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we examine and evaluate approaches for inducing classifiers from data, based on recent results in the theory of learning Bayesian networks. Bayesian networks are factored representations of probability distributions that generalize the naive Bayes classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness which are characteristic of naive Bayes. We experimentally tested these approaches using benchmark problems from the U. C. Irvine repository, and compared them against C4.5, naive Bayes, and wrapper-based feature selection methods.

  13. Explosive Formulation Code Naming SOP

    SciTech Connect

    Martz, H. E.

    2014-09-19

    The purpose of this SOP is to provide a procedure for giving individual HME formulations code names. A code name for an individual HME formulation consists of an explosive family code, given by the classified guide, followed by a dash, -, and a number. If the formulation requires preparation such as packing or aging, these add additional groups of symbols to the X-ray specimen name.

  14. Translator, Traitor, Source of Data: Classifying Translations of "Foreign Phrases" as an Awareness-Raising Exercise.

    ERIC Educational Resources Information Center

    Parkinson, Brian

    1998-01-01

    A system for classifying (coding) translations of sentence-length or similar material is presented and illustrated with codings of entries in the "Dictionary of Foreign Phrases and Classical Quotations." Problems in coding are discussed, relating especially to intertextuality, intention, and ownership. The system is intended for pedagogic use, and…

  15. Classifying Cereal Data (Earlier Methods)

    Cancer.gov

    The DSQ includes questions about cereal intake and allows respondents up to two responses on which cereals they consume. We classified each cereal reported first by hot or cold, and then along four dimensions: density of added sugars, whole grains, fiber, and calcium.

  16. Maximum margin Bayesian network classifiers.

    PubMed

    Pernkopf, Franz; Wohlmayr, Michael; Tschiatschek, Sebastian

    2012-03-01

    We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

  17. QtClassify: IFS data emission line candidates classifier

    NASA Astrophysics Data System (ADS)

    Kerutt, Josephine

    2017-03-01

    QtClassify is a GUI that helps classify emission lines found in integral field spectroscopic data. Input needed is a datacube as well as a catalog with emission lines and a signal-to-noise cube, such at that created by LSDCat (ascl:1612.002). The main idea is to take each detected line and guess what line it could be (and thus the redshift of the object). You would expect to see other lines that might not have been detected but are visible in the cube if you know where to look, which is why parts of the spectrum are shown where other lines are expected. In addition, monochromatic layers of the datacube are displayed, making it easy to spot additional emission lines.

  18. A Framework for Classifying Decision Support Systems

    PubMed Central

    Sim, Ida; Berlin, Amy

    2003-01-01

    Background Computer-based clinical decision support systems (CDSSs) vary greatly in design and function. A taxonomy for classifying CDSS structure and function would help efforts to describe and understand the variety of CDSSs in the literature, and to explore predictors of CDSS effectiveness and generalizability. Objective To define and test a taxonomy for characterizing the contextual, technical, and workflow features of CDSSs. Methods We retrieved and analyzed 150 English language articles published between 1975 and 2002 that described computer systems designed to assist physicians and/or patients with clinical decision making. We identified aspects of CDSS structure or function and iterated our taxonomy until additional article reviews did not result in any new descriptors or taxonomic modifications. Results Our taxonomy comprises 95 descriptors along 24 descriptive axes. These axes are in 5 categories: Context, Knowledge and Data Source, Decision Support, Information Delivery, and Workflow. The axes had an average of 3.96 coded choices each. 75% of the descriptors had an inter-rater agreement kappa of greater than 0.6. Conclusions We have defined and tested a comprehensive, multi-faceted taxonomy of CDSSs that shows promising reliability for classifying CDSSs reported in the literature. PMID:14728243

  19. An enhanced MITOMAP with a global mtDNA mutational phylogeny

    PubMed Central

    Ruiz-Pesini, Eduardo; Lott, Marie T.; Procaccio, Vincent; Poole, Jason C.; Brandon, Marty C.; Mishmar, Dan; Yi, Christina; Kreuziger, James; Baldi, Pierre; Wallace, Douglas C.

    2007-01-01

    The MITOMAP () data system for the human mitochondrial genome has been greatly enhanced by the addition of a navigable mutational mitochondrial DNA (mtDNA) phylogenetic tree of ∼3000 mtDNA coding region sequences plus expanded pathogenic mutation tables and a nuclear-mtDNA pseudogene (NUMT) data base. The phylogeny reconstructs the entire mutational history of the human mtDNA, thus defining the mtDNA haplogroups and differentiating ancient from recent mtDNA mutations. Pathogenic mutations are classified by both genotype and phenotype, and the NUMT sequences permits detection of spurious inclusion of pseudogene variants during mutation analysis. These additions position MITOMAP for the implementation of our automated mtDNA sequence analysis system, Mitomaster. PMID:17178747

  20. Energy-Efficient Neuromorphic Classifiers.

    PubMed

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

  1. Learning to classify species with barcodes

    PubMed Central

    Bertolazzi, Paola; Felici, Giovanni; Weitschek, Emanuel

    2009-01-01

    Background According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the gene cytochrome c oxidase I (COI) (the "barcode"). Since 2004 the Consortium for the Barcode of Life (CBOL) promotes the collection of barcode specimens and the development of methods to analyze the barcode for several tasks, among which the identification of rules to correctly classify an individual into its species by reading its barcode. Results We adopt a Logic Mining method based on two optimization models and present the results obtained on two datasets where a number of COI fragments are used to describe the individuals that belong to different species. The method proposed exhibits high correct recognition rates on a training-testing split of the available data using a small proportion of the information available (e.g., correct recognition approx. 97% when only 20 sites of the 648 available are used). The method is able to provide compact formulas on the values (A, C, G, T) at the selected sites that synthesize the characteristic of each species, a relevant information for taxonomists. Conclusion We have presented a Logic Mining technique designed to analyze barcode data and to provide detailed output of interest to the taxonomists and the barcode community represented in the CBOL Consortium. The method has proven to be effective, efficient and precise. PMID:19900303

  2. A 269-amino-acid segment with a pseudo-leucine zipper and a helix-turn-helix motif codes for the sequence-specific DNA-binding domain of herpes simplex virus type 1 origin-binding protein.

    PubMed Central

    Deb, S; Deb, S P

    1991-01-01

    The UL9 gene of herpes simplex virus (HSV) codes for a DNA-binding protein (OBP) that interacts sequence specifically with the origin of replication. This protein is essential for HSV DNA replication in cultured cells. The UL9 gene was cloned into a plasmid vector downstream of the SP6 RNA polymerase promoter. By using in vitro transcription and translation systems, a full-length OBP was synthesized. This synthetic protein is recognized by an antiserum generated against the C-terminal decapeptide of OBP and is functionally active in binding to OriS sequence specifically. The in vitro-synthesized protein has sequence specificity for binding similar to that found for the in vivo-generated OBP. A total of 14 in-frame deletion and insertion mutants of the UL9 gene were generated and expressed in vitro. Using these deletion mutants, we determined that the 269-amino-acid stretch defined by amino acids 564 to 832 localizes the OriS-specific DNA-binding domain. The N-terminal boundary is between amino acids 565 and 596, while the C terminus lies between amino acids 833 and 805. This segment contains a helix-turn-helix moiety and a pseudo-leucine zipper, neither of which alone can support DNA binding. The other leucine zipper from amino acids 150 to 173 is not required for the in vitro sequence-specific DNA-binding activity of OBP. Images PMID:1851856

  3. Laser vehicle detector/classifier

    NASA Astrophysics Data System (ADS)

    Schwartz, William C.

    1995-01-01

    This paper describes a diode-laser-based vehicle detector/classifier (VDC) presently being developed by Schwartz Electro-Optics (SEO) under an IVHS-IDEA program for the National Academy of Sciences. The VDC employs a scanning laser rangefinder to measure three- dimensional vehicle profiles that can be used for very accurate vehicle classification. The narrow laser beam width permits the detection of closely spaced vehicles moving at high speed; even a two-inch-wide tow bar can be detected. The VDC shows great promise for applications involving electronic toll collection from vehicles at freeway speeds, where very high detection and classification accuracy is mandatory.

  4. Ethical coding.

    PubMed

    Resnik, Barry I

    2009-01-01

    It is ethical, legal, and proper for a dermatologist to maximize income through proper coding of patient encounters and procedures. The overzealous physician can misinterpret reimbursement requirements or receive bad advice from other physicians and cross the line from aggressive coding to coding fraud. Several of the more common problem areas are discussed.

  5. A system for classifying wood-using industries and recording statistics for automatic data processing.

    Treesearch

    E.W. Fobes; R.W. Rowe

    1968-01-01

    A system for classifying wood-using industries and recording pertinent statistics for automatic data processing is described. Forms and coding instructions for recording data of primary processing plants are included.

  6. Dimensionality Reduction Through Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)

    1999-01-01

    In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.

  7. 28 CFR 701.14 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified information. 701.14 Section... UNDER THE FREEDOM OF INFORMATION ACT § 701.14 Classified information. In processing a request for information that is classified or classifiable under Executive Order 12356 or any other Executive Order...

  8. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Pollara, Fabrizio; Hamkins, Jon; Dolinar, Sam; Andrews, Ken; Divsalar, Dariush

    2006-01-01

    This viewgraph presentation reviews uplink coding. The purpose and goals of the briefing are (1) Show a plan for using uplink coding and describe benefits (2) Define possible solutions and their applicability to different types of uplink, including emergency uplink (3) Concur with our conclusions so we can embark on a plan to use proposed uplink system (4) Identify the need for the development of appropriate technology and infusion in the DSN (5) Gain advocacy to implement uplink coding in flight projects Action Item EMB04-1-14 -- Show a plan for using uplink coding, including showing where it is useful or not (include discussion of emergency uplink coding).

  9. A probabilistic classifier for olfactory receptor pseudogenes

    PubMed Central

    Menashe, Idan; Aloni, Ronny; Lancet, Doron

    2006-01-01

    Background Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes. Results To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms. Conclusion We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ~70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well. CORP algorithm is available at: PMID:16939646

  10. Classifying sex biased congenital anomalies

    SciTech Connect

    Lubinsky, M.S.

    1997-03-31

    The reasons for sex biases in congenital anomalies that arise before structural or hormonal dimorphisms are established has long been unclear. A review of such disorders shows that patterning and tissue anomalies are female biased, and structural findings are more common in males. This suggests different gender dependent susceptibilities to developmental disturbances, with female vulnerabilities focused on early blastogenesis/determination, while males are more likely to involve later organogenesis/morphogenesis. A dual origin for some anomalies explains paradoxical reductions of sex biases with greater severity (i.e., multiple rather than single malformations), presumably as more severe events increase the involvement of an otherwise minor process with opposite biases to those of the primary mechanism. The cause for these sex differences is unknown, but early dimorphisms, such as differences in growth or presence of H-Y antigen, may be responsible. This model provides a useful rationale for understanding and classifying sex-biased congenital anomalies. 42 refs., 7 tabs.

  11. Sharing code.

    PubMed

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing.

  12. The information capacity of the genetic code: Is the natural code optimal?

    PubMed

    Kuruoglu, Ercan E; Arndt, Peter F

    2017-04-21

    We envision the molecular evolution process as an information transfer process and provide a quantitative measure for information preservation in terms of the channel capacity according to the channel coding theorem of Shannon. We calculate Information capacities of DNA on the nucleotide (for non-coding DNA) and the amino acid (for coding DNA) level using various substitution models. We extend our results on coding DNA to a discussion about the optimality of the natural codon-amino acid code. We provide the results of an adaptive search algorithm in the code domain and demonstrate the existence of a large number of genetic codes with higher information capacity. Our results support the hypothesis of an ancient extension from a 2-nucleotide codon to the current 3-nucleotide codon code to encode the various amino acids. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Analytical performance of a bronchial genomic classifier.

    PubMed

    Hu, Zhanzhi; Whitney, Duncan; Anderson, Jessica R; Cao, Manqiu; Ho, Christine; Choi, Yoonha; Huang, Jing; Frink, Robert; Smith, Kate Porta; Monroe, Robert; Kennedy, Giulia C; Walsh, P Sean

    2016-02-26

    The current standard practice of lung lesion diagnosis often leads to inconclusive results, requiring additional diagnostic follow up procedures that are invasive and often unnecessary due to the high benign rate in such lesions (Chest 143:e78S-e92, 2013). The Percepta bronchial genomic classifier was developed and clinically validated to provide more accurate classification of lung nodules and lesions that are inconclusive by bronchoscopy, using bronchial brushing specimens (N Engl J Med 373:243-51, 2015, BMC Med Genomics 8:18, 2015). The analytical performance of the Percepta test is reported here. Analytical performance studies were designed to characterize the stability of RNA in bronchial brushing specimens during collection and shipment; analytical sensitivity defined as input RNA mass; analytical specificity (i.e. potentially interfering substances) as tested on blood and genomic DNA; and assay performance studies including intra-run, inter-run, and inter-laboratory reproducibility. RNA content within bronchial brushing specimens preserved in RNAprotect is stable for up to 20 days at 4 °C with no changes in RNA yield or integrity. Analytical sensitivity studies demonstrated tolerance to variation in RNA input (157 ng to 243 ng). Analytical specificity studies utilizing cancer positive and cancer negative samples mixed with either blood (up to 10 % input mass) or genomic DNA (up to 10 % input mass) demonstrated no assay interference. The test is reproducible from RNA extraction through to Percepta test result, including variation across operators, runs, reagent lots, and laboratories (standard deviation of 0.26 for scores on > 6 unit scale). Analytical sensitivity, analytical specificity and robustness of the Percepta test were successfully verified, supporting its suitability for clinical use.

  14. Diagnosis code assignment: models and evaluation metrics.

    PubMed

    Perotte, Adler; Pivovarov, Rimma; Natarajan, Karthik; Weiskopf, Nicole; Wood, Frank; Elhadad, Noémie

    2014-01-01

    The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20,533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.

  15. Diagnosis code assignment: models and evaluation metrics

    PubMed Central

    Perotte, Adler; Pivovarov, Rimma; Natarajan, Karthik; Weiskopf, Nicole; Wood, Frank; Elhadad, Noémie

    2014-01-01

    Background and objective The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. Methods We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. Results The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20 533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. Conclusions Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art. PMID:24296907

  16. Efficient DNA barcode regions for classifying Piper species (Piperaceae).

    PubMed

    Chaveerach, Arunrat; Tanee, Tawatchai; Sanubol, Arisa; Monkheang, Pansa; Sudmoon, Runglawan

    2016-01-01

    Piper species are used for spices, in traditional and processed forms of medicines, in cosmetic compounds, in cultural activities and insecticides. Here barcode analysis was performed for identification of plant parts, young plants and modified forms of plants. Thirty-six Piper species were collected and the three barcode regions, matK, rbcL and psbA-trnH spacer, were amplified, sequenced and aligned to determine their genetic distances. For intraspecific genetic distances, the most effective values for the species identification ranged from no difference to very low distance values. However, Piper betle had the highest values at 0.386 for the matK region. This finding may be due to Piper betle being an economic and cultivated species, and thus is supported with growth factors, which may have affected its genetic distance. The interspecific genetic distances that were most effective for identification of different species were from the matK region and ranged from a low of 0.002 in 27 paired species to a high of 0.486. Eight species pairs, Piper kraense and Piper dominantinervium, Piper magnibaccum and Piper kraense, Piper phuwuaense and Piper dominantinervium, Piper phuwuaense and Piper kraense, Piper pilobracteatum and Piper dominantinervium, Piper pilobracteatum and Piper kraense, Piper pilobracteatum and Piper phuwuaense and Piper sylvestre and Piper polysyphonum, that presented a genetic distance of 0.000 and were identified by independently using each of the other two regions. Concisely, these three barcode regions are powerful for further efficient identification of the 36 Piper species.

  17. Efficient DNA barcode regions for classifying Piper species (Piperaceae)

    PubMed Central

    Chaveerach, Arunrat; Tanee, Tawatchai; Sanubol, Arisa; Monkheang, Pansa; Sudmoon, Runglawan

    2016-01-01

    Abstract Piper species are used for spices, in traditional and processed forms of medicines, in cosmetic compounds, in cultural activities and insecticides. Here barcode analysis was performed for identification of plant parts, young plants and modified forms of plants. Thirty-six Piper species were collected and the three barcode regions, matK, rbcL and psbA-trnH spacer, were amplified, sequenced and aligned to determine their genetic distances. For intraspecific genetic distances, the most effective values for the species identification ranged from no difference to very low distance values. However, Piper betle had the highest values at 0.386 for the matK region. This finding may be due to Piper betle being an economic and cultivated species, and thus is supported with growth factors, which may have affected its genetic distance. The interspecific genetic distances that were most effective for identification of different species were from the matK region and ranged from a low of 0.002 in 27 paired species to a high of 0.486. Eight species pairs, Piper kraense and Piper dominantinervium, Piper magnibaccum and Piper kraense, Piper phuwuaense and Piper dominantinervium, Piper phuwuaense and Piper kraense, Piper pilobracteatum and Piper dominantinervium, Piper pilobracteatum and Piper kraense, Piper pilobracteatum and Piper phuwuaense and Piper sylvestre and Piper polysyphonum, that presented a genetic distance of 0.000 and were identified by independently using each of the other two regions. Concisely, these three barcode regions are powerful for further efficient identification of the 36 Piper species. PMID:27829794

  18. DNA polymorphism in morels: complete sequences of the internal transcribed spacer of genes coding for rRNA in Morchella esculenta (yellow morel) and Morchella conica (black morel).

    PubMed

    Wipf, D; Munch, J C; Botton, B; Buscot, F

    1996-09-01

    The internal transcribed spacer (ITS) of the gene coding for rRNA was sequenced in both directions with the gene walking technique in a black morel (Morchella conica) and a yellow morel (M. esculenta) to elucidate the ITS length discrepancy between the two species groups (750-bp ITS in black morels and 1,150-bp ITS in yellow morels.

  19. Hybrid k -Nearest Neighbor Classifier.

    PubMed

    Yu, Zhiwen; Chen, Hantao; Liuxs, Jiming; You, Jane; Leung, Hareton; Han, Guoqiang

    2016-06-01

    Conventional k -nearest neighbor (KNN) classification approaches have several limitations when dealing with some problems caused by the special datasets, such as the sparse problem, the imbalance problem, and the noise problem. In this paper, we first perform a brief survey on the recent progress of the KNN classification approaches. Then, the hybrid KNN (HBKNN) classification approach, which takes into account the local and global information of the query sample, is designed to address the problems raised from the special datasets. In the following, the random subspace ensemble framework based on HBKNN (RS-HBKNN) classifier is proposed to perform classification on the datasets with noisy attributes in the high-dimensional space. Finally, the nonparametric tests are proposed to be adopted to compare the proposed method with other classification approaches over multiple datasets. The experiments on the real-world datasets from the Knowledge Extraction based on Evolutionary Learning dataset repository demonstrate that RS-HBKNN works well on real datasets, and outperforms most of the state-of-the-art classification approaches.

  20. Measuring Diagnoses: ICD Code Accuracy

    PubMed Central

    O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M

    2005-01-01

    Objective To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. Data Sources/Study Setting The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. Study Design/Methods We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Principle Findings Main error sources along the “patient trajectory” include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the “paper trail” include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. Conclusions By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways. PMID:16178999

  1. Isolation and expression of a novel chick G-protein cDNA coding for a G alpha i3 protein with a G alpha 0 N-terminus.

    PubMed Central

    Kilbourne, E J; Galper, J B

    1994-01-01

    We have cloned cDNAs coding for G-protein alpha subunits from a chick brain cDNA library. Based on sequence similarity to G-protein alpha subunits from other eukaryotes, one clone was designated G alpha i3. A second clone, G alpha i3-o, was identical to the G alpha i3 clone over 932 bases on the 3' end. The 5' end of G alpha i3-o, however, contained an alternative sequence in which the first 45 amino acids coded for are 100% identical to the conserved N-terminus of G alpha o from species such as rat, mouse, human, bovine and hamster. Both clones were found to be expressed in all tissues studied. The unusual alpha o-alpha i3-like G-protein chimera, G alpha i3-o, was found to be expressed at significantly lower levels than G alpha i3. In vitro transcription and translation of the G alpha i3-o cDNA clone gave a protein of approx. 41 kDa which stably bound guanosine 5'-[gamma-thio]triphosphate. G alpha i3-o appears to be the first G-protein alpha subunit cloned which contains ends that are homologous to two different alpha subunit isoforms, G alpha o and G alpha i3. Images Figure 4 Figure 5 Figure 6 Figure 7 PMID:8297335

  2. Efficient Study of the 214 Classifiers ("Radicals")

    ERIC Educational Resources Information Center

    Cohen, Alvin P.

    1976-01-01

    The problem of looking up Chinese characters in dictionaries indexed by classifiers is simplified by identifying the 68 high frequency classifiers for memorization and leaving the least frequent to be identified from a chart. (CHK)

  3. 76 FR 34761 - Classified National Security Information

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-14

    ... From the Federal Register Online via the Government Publishing Office MARINE MAMMAL COMMISSION Classified National Security Information AGENCY: Marine Mammal Commission. ACTION: Notice. SUMMARY: This notice sets out the establishment of the Marine Mammal Commission's (MMC) policy on classified...

  4. Isolation and characterization of cDNA clones for rat ribophorin I: complete coding sequence and in vitro synthesis and insertion of the encoded product into endoplasmic reticulum membranes

    PubMed Central

    1987-01-01

    Ribophorins I and II are two transmembrane glycoproteins that are characteristic of the rough endoplasmic reticulum and are thought to be part of the apparatus that affects the co-translational translocation of polypeptides synthesized on membrane-bound polysomes. A ribophorin I cDNA clone containing a 0.6-kb insert was isolated from a rat liver lambda gtll cDNA library by immunoscreening with specific antibodies. This cDNA was used to isolate a clone (2.3 kb) from a rat brain lambda gtll cDNA library that contains the entire ribophorin I coding sequence. SP6 RNA transcripts of the insert in this clone directed the in vitro synthesis of a polypeptide of the expected size that was immunoprecipitated with anti-ribophorin I antibodies. When synthesized in the presence of microsomes, this polypeptide, like the translation product of the natural ribophorin I mRNA, underwent membrane insertion, signal cleavage, and co-translational glycosylation. The complete amino acid sequence of the polypeptide encoded in the cDNA insert was derived from the nucleotide sequence and found to contain a segment that corresponds to a partial amino terminal sequence of ribophorin I that was obtained by Edman degradation. This confirmed the identity of the cDNA clone and established that ribophorin I contains 583 amino acids and is synthesized with a cleavable amino terminal insertion signal of 22 residues. Analysis of the amino acid sequence of ribophorin I suggested that the polypeptide has a simple transmembrane disposition with a rather hydrophilic carboxy terminal segment of 150 amino acids exposed on the cytoplasmic face of the membrane, and a luminal domain of 414 amino acids containing three potential N-glycosylation sites. Hybridization measurements using the cloned cDNA as a probe showed that ribophorin I mRNA levels increase fourfold 15 h after partial hepatectomy, in confirmation of measurements made by in vitro translation of liver mRNA. Southern blot analysis of rat genomic

  5. Local classifier weighting by quadratic programming.

    PubMed

    Cevikalp, Hakan; Polikar, Robi

    2008-10-01

    It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.

  6. 15 CFR 4.8 - Classified Information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...

  7. 32 CFR 775.5 - Classified actions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Air Act (42 U.S.C. 7609 et seq.). (b) It should be noted that a classified EA/EIS serves the same “informed decisionmaking” purpose as does a published unclassified EA/EIS. Even though the classified EA/EIS... be considered by the decisionmaker for the proposed action. The content of a classified EA/EIS...

  8. 32 CFR 1602.8 - Classifying authority.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classifying authority. 1602.8 Section 1602.8 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.8 Classifying authority. The term classifying authority refers to any official or board who...

  9. Lesch-Nyhan syndrome: mRNA expression of HPRT in patients with enzyme proven deficiency of HPRT and normal HPRT coding region of the DNA.

    PubMed

    Nguyen, Khue Vu; Naviaux, Robert K; Paik, Kacie K; Nyhan, William L

    2012-08-01

    Inherited mutation of the purine salvage enzyme, hypoxanthine guanine phosphoribosyltransferase (HPRT) gives rise to Lesch-Nyhan syndrome (LNS) or Lesch-Nyhan variants (LNV). We report a case of two LNS affected members of a family with deficiency of activity of HPRT in intact cultured fibroblasts in whom mutation could not be found in the HPRT coding sequence but there was markedly decreased HPRT expression of mRNA. Published by Elsevier Inc.

  10. 76 FR 19707 - Classified Information: Classification/Declassification/Access; Authority To Classify Information

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-08

    ... originally classify information as SECRET or CONFIDENTIAL to the Administrator of the Federal Aviation... 13526 confers upon the Secretary the authority to originally classify information as SECRET...

  11. Sharing code

    PubMed Central

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing. PMID:25165519

  12. Small non-coding RNA and cancer.

    PubMed

    Romano, Giulia; Veneziano, Dario; Acunzo, Mario; Croce, Carlo M

    2017-05-01

    The ENCODE project has reported that at least 80% of the human genome is biologically active, yet only a small part of human DNA encodes for protein. The massive amount of RNA transcribed but not translated into protein can be classified as housekeeping RNA (such as rRNA, tRNA) and regulatory RNA (such as miRNA, piRNA, lncRNA). Small non-coding RNAs, in particular, have been the focus of many studies in the last 20 years and their fundamental role in many human diseases is currently well established. Inter alia, their role in cancer development and progression, as well as in drug resistance, is being increasingly investigated. In this review, focusing our attention on recent research results, we provide an overview of the four large classes of small non-coding RNAs, namely, miRNAs, piRNAs, snoRNA and the new class of tRNA-derived fragments, highlighting their fundamental role in cancer and their potential as diagnostic and prognostic biomarkers. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Large tandem repeats make up the chromosome bar code: a hypothesis.

    PubMed

    Podgornaya, Olga; Gavrilova, Ekaterina; Stephanova, Vera; Demin, Sergey; Komissarov, Aleksey

    2013-01-01

    Much of tandem repeats' functional nature in any genome remains enigmatic because there are only few tools available for dissecting and elucidating the functions of repeated DNA. The large tandem repeat arrays (satellite DNA) found in two mouse whole-genome shotgun assemblies were classified into 4 superfamilies, 8 families, and 62 subfamilies. With the simplified variant of chromosome positioning of different tandem repeats, we noticed the nonuniform distribution instead of the positions reported for mouse major and minor satellites. It is visible that each chromosome possesses a kind of unique code made up of different large tandem repeats. The reference genomes allow marking only internal tandem repeats, and even with such a limited data, the colored "bar code" made up of tandem repeats is visible. We suppose that tandem repeats bare the mechanism for chromosomes to recognize the regions to be associated. The associations, initially established via RNA, become fixed by histone modifications (the histone or chromatin code) and specific proteins. In such a way, associations, being at the beginning flexible and regulated, that is, adjustable, appear as irreversible and inheritable in cell generations. Tandem repeat multiformity tunes the developed nuclei 3D pattern by sequential steps of associations. Tandem repeats-based chromosome bar code could be the carrier of the genome structural information; that is, the order of precise tandem repeat association is the DNA morphogenetic program. Tandem repeats are the cores of the distinct 3D structures postulated in "gene gating" hypothesis. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. 22 CFR 125.3 - Exports of classified technical data and classified defense articles.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Exports of classified technical data and... IN ARMS REGULATIONS LICENSES FOR THE EXPORT OF TECHNICAL DATA AND CLASSIFIED DEFENSE ARTICLES § 125.3 Exports of classified technical data and classified defense articles. (a) A request for authority...

  15. Error minimizing algorithms for nearest eighbor classifiers

    SciTech Connect

    Porter, Reid B; Hush, Don; Zimmer, G. Beate

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.

  16. Monitoring tool wear using classifier fusion

    NASA Astrophysics Data System (ADS)

    Kannatey-Asibu, Elijah; Yum, Juil; Kim, T. H.

    2017-02-01

    Real time monitoring of manufacturing processes using a single sensor often poses significant challenge. Sensor fusion has thus been extensively investigated in recent years for process monitoring with significant improvement in performance. This paper presents the results for a monitoring system based on the concept of classifier fusion, and class-weighted voting is investigated to further enhance the system performance. Classifier weights are based on the overall performances of individual classifiers, and majority voting is used in decision making. Acoustic emission monitoring of tool wear during the coroning process is used to illustrate the concept. A classification rate of 87.7% was obtained for classifier fusion with unity weighting. When weighting was based on overall performance of the respective classifiers, the classification rate improved to 95.6%. Further using state performance weighting resulted in a 98.5% classification. Finally, the classifier fusion performance further increased to 99.7% when a penalty vote was applied on the weighting factor.

  17. The changing epitome of species identification – DNA barcoding

    PubMed Central

    Ajmal Ali, M.; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M.A.; Pandey, Arun K.; Lee, Joongku

    2014-01-01

    The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The ‘DNA barcodes’ show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007

  18. The changing epitome of species identification - DNA barcoding.

    PubMed

    Ajmal Ali, M; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M A; Pandey, Arun K; Lee, Joongku

    2014-07-01

    The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The 'DNA barcodes' show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more.

  19. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  20. Molecular cloning of a cDNA coding biliary glycoprotein I: Primary structure of a glycoprotein immunologically crossreactive with carcinoembryonic antigen

    SciTech Connect

    Hinoda, Y.; Neumaier, M.; Hefta, S.A.; Drzeniek, Z.; Wagener, C.; Shively, L.; Hefta, L.J.F.; Shively, J.E.; Paxton, R.J.

    1988-09-01

    The authors have isolated and sequenced four overlapping cDNA clones from a normal adult human colon library, which together gave the entire nucleotide sequence for biliary glycoprotein I (BGPI). BGPI is a member of the carcinoembryonic antigen (CEA) gene family, which is a subfamily in the immunoglobulin gene superfamily. The deduced amino acid sequence of the combined clones for BGP I revealed a 34-residue leader sequence followed by a 108-residue N-terminal domain, a 178-residue immunoglobulin-like domain, a 108-residue region specific to BGP I, a 24-residue transmembrane domain, and a 35-residue cytoplasmic domain. The nucleotide sequence of BGP I exhibited greater than 80% identity with CEA and nonspecific crossreacting antigen (NCA) in the leader peptide, N-terminal domain, and immunoglobulin-like domain. They propose that BGP I diverged from NCA by acquiring an immunoglobulin-like domain substantially different from the domains found in NCA or CEA and also a new cytoplasmic domain. The latter feature should result in a substantially different membrane anchorage mechanism of BGP I compared to CEA, which lacks the cytoplasmic domain and is anchored via a phosphatidylinositol-glycan structure. Protein structural analysis of BGP I isolated from human bile revealed a blocked N terminus, 129 amino acids of internal sequence that are in agreement with the translated cDNA sequence, and five glycosylation sites in the peptides sequenced.

  1. The coding region of the UFGT gene is a source of diagnostic SNP markers that allow single-locus DNA genotyping for the assessment of cultivar identity and ancestry in grapevine (Vitis vinifera L.)

    PubMed Central

    2013-01-01

    Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902

  2. Coding design for error correcting output codes based on perceptron

    NASA Astrophysics Data System (ADS)

    Zhou, Jin-Deng; Wang, Xiao-Dan; Zhou, Hong-Jian; Cui, Yong-Hua; Jing, Sun

    2012-05-01

    It is known that error-correcting output codes (ECOC) is a common way to model multiclass classification problems, in which the research of encoding based on data is attracting more and more attention. We propose a method for learning ECOC with the help of a single-layered perception neural network. To achieve this goal, the code elements of ECOC are mapped to the weights of network for the given decoding strategy, and an object function with the constrained weights is used as a cost function of network. After the training, we can obtain a coding matrix including lots of subgroups of class. Experimental results on artificial data and University of California Irvine with logistic linear classifier and support vector machine as the binary learner show that our scheme provides better performance of classification with shorter length of coding matrix than other state-of-the-art encoding strategies.

  3. Nature's Code

    NASA Astrophysics Data System (ADS)

    Hill, Vanessa J.; Rowlands, Peter

    2008-10-01

    We propose that the mathematical structures related to the `universal rewrite system' define a universal process applicable to Nature, which we may describe as `Nature's code'. We draw attention here to such concepts as 4 basic units, 64- and 20-unit structures, symmetry-breaking and 5-fold symmetry, chirality, double 3-dimensionality, the double helix, the Van der Waals force and the harmonic oscillator mechanism, and our explanation of how they necessarily lead to self-aggregation, complexity and emergence in higher-order systems. Biological concepts, such as translation, transcription, replication, the genetic code and the grouping of amino acids appear to be driven by fundamental processes of this kind, and it would seem that the Platonic solids, pentagonal symmetry and Fibonacci numbers have significant roles in organizing `Nature's code'.

  4. Show Code.

    PubMed

    Shalev, Daniel

    2017-01-01

    "Let's get one thing straight: there is no such thing as a show code," my attending asserted, pausing for effect. "You either try to resuscitate, or you don't. None of this halfway junk." He spoke so loudly that the two off-service consultants huddled at computers at the end of the unit looked up… We did four rounds of compressions and pushed epinephrine twice. It was not a long code. We did good, strong compressions and coded this man in earnest until the end. Toward the final round, though, as I stepped up to do compressions, my attending looked at me in a deep way. It was a look in between willing me as some object under his command and revealing to me everything that lay within his brash, confident surface but could not be spoken. © 2017 The Hastings Center.

  5. Epigenetic DNA-methylation regulation of genes coding for lipid raft-associated components: a role for raft proteins in cell transformation and cancer progression (review).

    PubMed

    Patra, Samir K; Bettuzzi, Saverio

    2007-06-01

    Metastatic progression is the cause of most cancer deaths. Host tumour cell separation (fission) is accompanied by simultaneous acquisition of migrating capability of cancer cells, remodeling of cellular architecture and effective 'homing' in body host environment. Cell remodeling involves cytoskeletal protein-protein and lipid-protein interaction together with altered signaling. Alteration of signaling in tumour cells may affect expression of many genes also by DNA-methylation/demethylation. This would alter the steady-state intracellular level of structural proteins or metabolic enzymes, and notably enzymes involved in the biosynthesis of lipids, affecting the composition of membranes. Lipid rafts are small, heterogeneous, highly dynamic, sterol- and sphingolipid-enriched domains that compartmentalize cellular processes. Small rafts can be stabilized to form larger platforms through protein-protein and protein-lipid interactions. Lipid rafts play an important role in intracellular protein transport, membrane fusion and trans-cytosis, also being platforms for cell surface antigens and adhesion molecules which are crucial for cell activation, polarization and signaling. Detachment of individual tumour cells from the host tumour lump requires lipid-protein-lipid raft (LPLR) reordering. Lipid rafts are also involved in angiogenesis and local invasion, which occurs within the host tumour vicinity by exchange of enzymes, cytokines and motility factors that modify the surrounding extracellular matrix (ECM). Many cell surface adhesion, ECM, and signaling proteins (such as E-cadherin, catenin, CD44, MMP-9 and caveolin-1) are known to be absent or reduced following gene promoter-CpG-island hypermethylation in mid-stage growing tumours, but re-expressed (by gene promoter-mCpG-DNA demethylation) in carcinomas such as metastasized lung, prostate and sarcomas. The recent research acquisitions on lipid rafts have tremendous implications in understanding the genetic and

  6. Phylogenetic footprinting of non-coding RNA: hammerhead ribozyme sequences in a satellite DNA family of Dolichopoda cave crickets (Orthoptera, Rhaphidophoridae)

    PubMed Central

    2010-01-01

    Background The great variety in sequence, length, complexity, and abundance of satellite DNA has made it difficult to ascribe any function to this genome component. Recent studies have shown that satellite DNA can be transcribed and be involved in regulation of chromatin structure and gene expression. Some satellite DNAs, such as the pDo500 sequence family in Dolichopoda cave crickets, have a catalytic hammerhead (HH) ribozyme structure and activity embedded within each repeat. Results We assessed the phylogenetic footprints of the HH ribozyme within the pDo500 sequences from 38 different populations representing 12 species of Dolichopoda. The HH region was significantly more conserved than the non-hammerhead (NHH) region of the pDo500 repeat. In addition, stems were more conserved than loops. In stems, several compensatory mutations were detected that maintain base pairing. The core region of the HH ribozyme was affected by very few nucleotide substitutions and the cleavage position was altered only once among 198 sequences. RNA folding of the HH sequences revealed that a potentially active HH ribozyme can be found in most of the Dolichopoda populations and species. Conclusions The phylogenetic footprints suggest that the HH region of the pDo500 sequence family is selected for function in Dolichopoda cave crickets. However, the functional role of HH ribozymes in eukaryotic organisms is unclear. The possible functions have been related to trans cleavage of an RNA target by a ribonucleoprotein and regulation of gene expression. Whether the HH ribozyme in Dolichopoda is involved in similar functions remains to be investigated. Future studies need to demonstrate how the observed nucleotide changes and evolutionary constraint have affected the catalytic efficiency of the hammerhead. PMID:20047671

  7. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental document...

  8. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 28 Judicial Administration 2 2014-07-01 2014-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental document...

  9. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 28 Judicial Administration 2 2012-07-01 2012-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental document...

  10. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental document...

  11. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental document...

  12. A fuzzy classifier system for process control

    NASA Technical Reports Server (NTRS)

    Karr, C. L.; Phillips, J. C.

    1994-01-01

    A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.

  13. 28 CFR 16.7 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 1 2013-07-01 2013-07-01 false Classified information. 16.7 Section 16.7 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION Procedures for Disclosure of Records Under the Freedom of Information Act § 16.7 Classified information. In...

  14. 28 CFR 16.44 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 1 2013-07-01 2013-07-01 false Classified information. 16.44 Section 16.44 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION... information. In processing a request for access to a record containing information that is classified under...

  15. 28 CFR 700.14 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified information. 700.14 Section... INFORMATION OF THE OFFICE OF INDEPENDENT COUNSEL Protection of Privacy and Access to Individual Records Under the Privacy Act of 1974 § 700.14 Classified information. In processing a request for access to a...

  16. 48 CFR 927.207 - Classified contracts.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Classified contracts. 927.207 Section 927.207 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL CONTRACTING REQUIREMENTS PATENTS, DATA, AND COPYRIGHTS Patents 927.207 Classified contracts....

  17. 32 CFR 1633.1 - Classifying authority.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... reclassify a registrant other than a volunteer for induction, into Class 1-A out of another class prior to... issuing an induction order to a registrant, appropriately classify him if the Secretary of Defense has... service. (f) Compensated employees of an area office may in accord with § 1633.2 may classify a registrant...

  18. 32 CFR 1633.1 - Classifying authority.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... reclassify a registrant other than a volunteer for induction, into Class 1-A out of another class prior to... issuing an induction order to a registrant, appropriately classify him if the Secretary of Defense has... service. (f) Compensated employees of an area office may in accord with § 1633.2 may classify a registrant...

  19. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  20. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  1. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the objectives, meeting goals and overall NASA goals for the NASA Data Standards Working Group. The presentation includes information on the technical progress surrounding the objective, short LDPC codes, and the general results on the Pu-Pw tradeoff.

  2. Modelling partially cross-classified multilevel data.

    PubMed

    Luo, Wen; Cappaert, Kevin J; Ning, Ling

    2015-05-01

    This article proposes an approach to modelling partially cross-classified multilevel data where some of the level-1 observations are nested in one random factor and some are cross-classified by two random factors. Comparisons between a proposed approach to two other commonly used approaches which treat the partially cross-classified data as either fully nested or fully cross-classified are completed with a simulation study. Results show that the proposed approach demonstrates desirable performance in terms of parameter estimates and statistical inferences. Both the fully nested model and the fully cross-classified model suffer from biased estimates of some variance components and statistical inferences of some fixed effects. Results also indicate that the proposed model is robust against cluster size imbalance. © 2015 The British Psychological Society.

  3. DNA methylation mediated up-regulation of TERRA non-coding RNA is coincident with elongated telomeres in the human placenta.

    PubMed

    Novakovic, Boris; Napier, Christine E; Vryer, Regan; Dimitriadis, Eva; Manuelpillai, Ursula; Sharkey, Andrew; Craig, Jeffrey M; Reddel, Roger R; Saffery, Richard

    2016-11-01

    What factors regulate elongated telomere length in the human placenta? Hypomethylation of TERRA promoters in the human placenta is associated with high TERRA expression, however, no clear mechanistic link between these phenomena and elongated telomere length in the human placenta was found. Human placenta tissue and trophoblasts show longer telomere lengths compared to gestational age-matched somatic cells. However, telomerase (hTERT) expression and activity in the placenta is low, suggesting a role for an alternative lengthening of telomeres (ALT). While ALT is observed in 10-15% of human cancers and in some mouse stem cells, ALT has never been reported in non-cancerous human tissues. Human term placental tissue and matched cord blood mononuclear cells (CBMCs) were collected as part of the Peri/Postnatal Epigenetic Twins study (PETS). In addition, first trimester placental villi, purified cytotrophoblasts, choriocarcinoma cell lines and a panel of ALT-positive cancer cell lines were tested. Telomere length was determined using the Terminal Restriction Fragment (TRF) assay and a relative quantitative PCR method. DNA methylation levels at several CpG rich subtelomeric TERRA promoters were determined using bisulfite conversion and the SEQUENOM EpiTYPER platform. Expression of TERRA and hTERT was determined using quantitative RT-PCR. ALT was assessed using the C-circle assay (CCA). The human placenta tissue and purified first trimester trophoblasts showed low subtelomeric (TERRA) DNA methylation compared to matched CBMCs and other somatic cells. Interestingly placental TERRA methylation was lower than ALT-cancer cell lines, previously reported to be hypomethylated at these loci. Low TERRA methylation was associated with higher expression of TERRA RNA in placenta compared to matched CBMCs. Detectable levels of C-circles were observed in first trimester placental villi, but not term placenta, suggesting that the ALT mechanism may be active in specific placental cells in

  4. Orthopedics coding and funding.

    PubMed

    Baron, S; Duclos, C; Thoreux, P

    2014-02-01

    The French tarification à l'activité (T2A) prospective payment system is a financial system in which a health-care institution's resources are based on performed activity. Activity is described via the PMSI medical information system (programme de médicalisation du système d'information). The PMSI classifies hospital cases by clinical and economic categories known as diagnosis-related groups (DRG), each with an associated price tag. Coding a hospital case involves giving as realistic a description as possible so as to categorize it in the right DRG and thus ensure appropriate payment. For this, it is essential to understand what determines the pricing of inpatient stay: namely, the code for the surgical procedure, the patient's principal diagnosis (reason for admission), codes for comorbidities (everything that adds to management burden), and the management of the length of inpatient stay. The PMSI is used to analyze the institution's activity and dynamism: change on previous year, relation to target, and comparison with competing institutions based on indicators such as the mean length of stay performance indicator (MLS PI). The T2A system improves overall care efficiency. Quality of care, however, is not presently taken account of in the payment made to the institution, as there are no indicators for this; work needs to be done on this topic. Copyright © 2014. Published by Elsevier Masson SAS.

  5. Schrödinger's code-script: not a genetic cipher but a code of development.

    PubMed

    Walsby, A E; Hodge, M J S

    2017-06-01

    In his book What is Life? Erwin Schrödinger coined the term 'code-script', thought by some to be the first published suggestion of a hereditary code and perhaps a forerunner of the genetic code. The etymology of 'code' suggests three meanings relevant to 'code-script which we distinguish as 'cipher-code', 'word-code' and 'rule-code'. Cipher-codes and word-codes entail translation of one set of characters into another. The genetic code comprises not one but two cipher-codes: the first is the DNA 'base-pairing cipher'; the second is the 'nucleotide-amino-acid cipher', which involves the translation of DNA base sequences into amino-acid sequences. We suggest that Schrödinger's code-script is a form of 'rule-code', a set of rules that, like the 'highway code' or 'penal code', requires no translation of a message. Schrödinger first relates his code-script to chromosomal genes made of protein. Ignorant of its properties, however, he later abandons 'protein' and adopts in its place a hypothetical, isomeric 'aperiodic solid' whose atoms he imagines rearranged in countless different conformations, which together are responsible for the patterns of ontogenetic development. In an attempt to explain the large number of combinations required, Schrödinger referred to the Morse code (a cipher) but in doing so unwittingly misled readers into believing that he intended a cipher-code resembling the genetic code. We argue that the modern equivalent of Schrödinger's code-script is a rule-code of organismal development based largely on the synthesis, folding, properties and interactions of numerous proteins, each performing a specific task. Copyright © 2016. Published by Elsevier Ltd.

  6. Breaking the DNA-binding code of Ralstonia solanacearum TAL effectors provides new possibilities to generate plant resistance genes against bacterial wilt disease.

    PubMed

    de Lange, Orlando; Schreiber, Tom; Schandry, Niklas; Radeck, Jara; Braun, Karl Heinz; Koszinowski, Julia; Heuer, Holger; Strauß, Annett; Lahaye, Thomas

    2013-08-01

    Ralstonia solanacearum is a devastating bacterial phytopathogen with a broad host range. Ralstonia solanacearum injected effector proteins (Rips) are key to the successful invasion of host plants. We have characterized Brg11(hrpB-regulated 11), the first identified member of a class of Rips with high sequence similarity to the transcription activator-like (TAL) effectors of Xanthomonas spp., collectively termed RipTALs. Fluorescence microscopy of in planta expressed RipTALs showed nuclear localization. Domain swaps between Brg11 and Xanthomonas TAL effector (TALE) AvrBs3 (avirulence protein triggering Bs3 resistance) showed the functional interchangeability of DNA-binding and transcriptional activation domains. PCR was used to determine the sequence of brg11 homologs from strains infecting phylogenetically diverse host plants. Brg11 localizes to the nucleus and activates promoters containing a matching effector-binding element (EBE). Brg11 and homologs preferentially activate promoters containing EBEs with a 5' terminal guanine, contrasting with the TALE preference for a 5' thymine. Brg11 and other RipTALs probably promote disease through the transcriptional activation of host genes. Brg11 and the majority of homologs identified in this study were shown to activate similar or identical target sequences, in contrast to TALEs, which generally show highly diverse target preferences. This information provides new options for the engineering of plants resistant to R. solanacearum. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.

  7. Isolation and characterization of an atypical LEA protein coding cDNA and its promoter from drought-tolerant plant Prosopis juliflora.

    PubMed

    George, Suja; Usha, B; Parida, Ajay

    2009-05-01

    Plant growth and productivity are adversely affected by various abiotic and biotic stress factors. Despite the wealth of information on abiotic stress and stress tolerance in plants, many aspects still remain unclear. Prosopis juliflora is a hardy plant reported to be tolerant to drought, salinity, extremes of soil pH, and heavy metal stress. In this paper, we report the isolation and characterization of the complementary DNA clone for an atypical late embryogenesis abundant (LEA) protein (Pj LEA3) and its putative promoter sequence from P. juliflora. Unlike typical LEA proteins, rich in glycine, Pj LEA3 has alanine as the most abundant amino acid followed by serine and shows an average negative hydropathy. Pj LEA3 is significantly different from other LEA proteins in the NCBI database and shows high similarity to indole-3 acetic-acid-induced protein ARG2 from Vigna radiata. Northern analysis for Pj LEA3 in P. juliflora leaves under 90 mM H2O2 stress revealed up-regulation of transcript at 24 and 48 h. A 1.5-kb fragment upstream the 5' UTR of this gene (putative promoter) was isolated and analyzed in silico. The possible reasons for changes in gene expression during stress in relation to the host plant's stress tolerance mechanisms are discussed.

  8. Double-coding nucleic acids: introduction of a nucleobase sequence in the major groove of the DNA duplex using double-headed nucleotides.

    PubMed

    Kumar, Pawan; Sorinas, Antoni Figueras; Nielsen, Lise J; Slot, Maria; Skytte, Kirstine; Nielsen, Annie S; Jensen, Michael D; Sharma, Pawan K; Vester, Birte; Petersen, Michael; Nielsen, Poul

    2014-09-05

    A series of double-headed nucleosides were synthesized using the Sonogashira cross-coupling reaction. In the reactions, additional nucleobases (thymine, cytosine, adenine, or guanine) were attached to the 5-position of 2'-deoxyuridine or 2'-deoxycytidine through a propyne linker. The modified nucleosides were incorporated into oligonucleotides, and these were combined in different duplexes that were analyzed by thermal denaturation studies. All of the monomers were well tolerated in the DNA duplexes and induced only small changes in the thermal stability. Consecutive incorporations of the monomers led to increases in duplex stability owing to increased stacking interactions. The modified nucleotide monomers maintained the Watson-Crick base pair fidelity. Stable duplexes were observed with heavily modified oligonucleotides featuring 14 consecutive incorporations of different double-headed nucleotide monomers. Thus, modified duplexes with an array of nucleobases on the exterior of the duplex were designed. Molecular dynamics simulations demonstrated that the additional nucleobases could expose their Watson-Crick and/or Hoogsteen faces for recognition in the major groove. This presentation of nucleobases may find applications in providing molecular information without unwinding the duplex.

  9. Sequencing of the coding exons of the LRP1 and LDLR genes on individual DNA samples reveals novel mutations in both genes.

    PubMed

    Van Leuven, F; Thiry, E; Lambrechts, M; Stas, L; Boon, T; Bruynseels, K; Muls, E; Descamps, O

    2001-02-15

    Five coding polymorphisms in de LRP1 gene, i.e. A217V, A775P, D2080N, D2632E and G4379S were discovered by sequencing its 89 exons in three test-groups of 22 healthy individuals, 29 Alzheimer patients and 18 individuals with different clinical and molecularly uncharacterized lipid metabolism problems. No genetic defect was evident in the LRP1 gene of any of the Alzheimer's disease (AD) patients, further excluding LRP1 as a major genetic problem in AD. Lipoprotein receptor related protein (LRP) A217V (exon 6) was clearly present in all groups as a polymorphism, while D2632E was observed only once in a healthy volunteer. On the other hand, LRP1 alleles A775P, D2080N, and G4379 were encountered only in patients with FH or with undefined problems of lipid metabolism. This finding forced one to also analyze the LDL receptor (LDLR) gene, for which a method was devised to sequence the entire region comprising LDLR exons 2-18. The resulting sequence contig of 33567 nucleotides yielded finally an exact physical map that corrects published and listed LDLR gene maps in many positions. In addition, next to known mutations in LDLR that cause FH, four novel LDLR defects were defined, i.e. del e7-10, exon 9 mutation N407T, a 20 bp insertion in exon 4, and a double mutation C292W/K290R in exon 6. No evidence for pathology connected to the LRP1 'mutations' was obtained by subsequent screening for the five LRP1 variants in larger groups of 110 FH patients and 118 patients with molecularly undefined, clinical problems of cholesterol and/or lipid metabolism. In three individuals with a mutant LDLR gene a variant LRP1 allele was also present, but without direct, obvious clinical compound effects, indicating that the variant LRP1 alleles must, for the present, be considered polymorphisms.

  10. Haplogrouping mitochondrial DNA sequences in Legal Medicine/Forensic Genetics.

    PubMed

    Bandelt, Hans-Jürgen; van Oven, Mannis; Salas, Antonio

    2012-11-01

    Haplogrouping refers to the classification of (partial) mitochondrial DNA (mtDNA) sequences into haplogroups using the current knowledge of the worldwide mtDNA phylogeny. Haplogroup assignment of mtDNA control-region sequences assists in the focused comparison with closely related complete mtDNA sequences and thus serves two main goals in forensic genetics: first is the a posteriori quality analysis of sequencing results and second is the prediction of relevant coding-region sites for confirmation or further refinement of haplogroup status. The latter may be important in forensic casework where discrimination power needs to be as high as possible. However, most articles published in forensic genetics perform haplogrouping only in a rudimentary or incorrect way. The present study features PhyloTree as the key tool for assigning control-region sequences to haplogroups and elaborates on additional Web-based searches for finding near-matches with complete mtDNA genomes in the databases. In contrast, none of the automated haplogrouping tools available can yet compete with manual haplogrouping using PhyloTree plus additional Web-based searches, especially when confronted with artificial recombinants still present in forensic mtDNA datasets. We review and classify the various attempts at haplogrouping by using a multiplex approach or relying on automated haplogrouping. Furthermore, we re-examine a few articles in forensic journals providing mtDNA population data where appropriate haplogrouping following PhyloTree immediately highlights several kinds of sequence errors.

  11. Classification and Coding, An Introduction and Review of Classification and Coding Systems. Management Guide No. 1.

    ERIC Educational Resources Information Center

    MacConnell, W.

    Nearly all organizations are faced with problems of classifying and coding financial data, management and technical information, components, stores, etc. and need to apply some logical and meaningful system of identification. This report examines the objectives and applications of classification and coding systems and reviews eight systems…

  12. Expanding context against weighted voting of classifiers

    NASA Astrophysics Data System (ADS)

    Terziyan, Vagan; Omelayenko, Boris; Puuronen, Seppo J.

    2000-04-01

    In the paper we propose a new method to integrate the predictions of multiple classifiers for Data Mining and Machine Learning tasks. The method assumes that each classifier stands in it's own context, and the contexts are partially ordered. The order is defined by monotonous quality function that maps each context to the value from the interval [0,1]. The classifier that has the context with better quality is supposed to predict better than the classifier from worse quality. The objective is to generate the opinion of `virtual' classifier that stands in the context with quality equal to 1. This virtual classifier must have the best accuracy of predictions due to the best context. To do this we build the regression where each prediction is put with the weight, equal to quality evaluation of the context of the correspondent classifier. This regression will give us the best opinion in the point 1. Some experiments on the vowel recognition tasks showed validity of the approach.

  13. Logarithmic learning for generalized classifier neural network.

    PubMed

    Ozyildirim, Buse Melis; Avci, Mutlu

    2014-12-01

    Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network.

  14. A proposed system for classifying psychotropic drugs.

    PubMed

    Howland, Robert H

    2014-12-01

    Most drugs used in psychiatry are classified according to their initial or main therapeutic indications rather than by their pharmacological profiles. A proposed multi-axial, pharmacologically driven nomenclature system that would reclassify existing psychotropic drugs and provide a framework for classifying new drug compounds is described. The five axes of this system would describe a drug's primary pharmacological target and relevant mechanism; relevant neurotransmitter and mechanism; neurobiological activities; efficacy and side effects; and approved indications. The proposed multi-axial system is a common sense but scientifically informed approach for classifying psychotropic drugs that would be practically useful for prescribers, clinicians, and patients. Copyright 2014, SLACK Incorporated.

  15. Iterative least squares functional networks classifier.

    PubMed

    El-Sebakhy, Emad A; Hadi, Ali S; Faisal, Kanaan A

    2007-05-01

    This paper proposes unconstrained functional networks as a new classifier to deal with the pattern recognition problems. Both methodology and learning algorithm for this kind of computational intelligence classifier using the iterative least squares optimization criterion are derived. The performance of this new intelligent systems scheme is demonstrated and examined using real-world applications. A comparative study with the most common classification algorithms in both machine learning and statistics communities is carried out. The study was achieved with only sets of second-order linearly independent polynomial functions to approximate the neuron functions. The results show that this new framework classifier is reliable, flexible, stable, and achieves a high-quality performance.

  16. A Method for the Annotation of Functional Similarities of Coding DNA Sequences: the Case of a Populated Cluster of Transmembrane Proteins.

    PubMed

    Fuertes, Miguel Angel; Rodrigo, José Ramón; Alonso, Carlos

    2017-01-01

    The analysis of a large number of human and mouse genes codifying for a populated cluster of transmembrane proteins revealed that some of the genes significantly vary in their primary nucleotide sequence inter-species and also intra-species. In spite of that divergence and of the fact that all these genes share a common parental function we asked the question of whether at DNA level they have some kind of common compositional structure, not evident from the analysis of their primary nucleotide sequence. To reveal the existence of gene clusters not based on primary sequence relationships we have analyzed 13574 human and 14047 mouse genes by the composon-clustering methodology. The data presented show that most of the genes from each one of the samples are distributed in 18 clusters sharing the common compositional features between the particular human and mouse clusters. It was observed, in addition, that between particular human and mouse clusters having similar composon-profiles large variations in gene population were detected as an indication that a significant amount of orthologs between both species differs in compositional features. A gene cluster containing exclusively genes codifying for transmembrane proteins, an important fraction of which belongs to the Rhodopsin G-protein coupled receptor superfamily, was also detected. This indicates that even though some of them display low sequence similarity, all of them, in both species, participate with similar compositional features in terms of composons. We conclude that in this family of transmembrane proteins in general and in the Rhodopsin G-protein coupled receptor in particular, the composon-clustering reveals the existence of a type of common compositional structure underlying the primary nucleotide sequence closely correlated to function.

  17. Viroids: the minimal non-coding RNAs with autonomous replication.

    PubMed

    Flores, Ricardo; Delgado, Sonia; Gas, María-Eugenia; Carbonell, Alberto; Molina, Diego; Gago, Selma; De la Peña, Marcos

    2004-06-01

    Viroids are small (246-401 nucleotides), non-coding, circular RNAs able to replicate autonomously in certain plants. Viroids are classified into the families Pospiviroidae and Avsunviroidae, whose members replicate in the nucleus and chloroplast, respectively. Replication occurs by an RNA-based rolling-circle mechanism in three steps: (1). synthesis of longer-than-unit strands catalyzed by host DNA-dependent RNA polymerases forced to transcribe RNA templates, (2). processing to unit-length, which in family Avsunviroidae is mediated by hammerhead ribozymes, and (3). circularization either through an RNA ligase or autocatalytically. Disease induction might result from the accumulation of viroid-specific small interfering RNAs that, via RNA silencing, could interfere with normal developmental pathways.

  18. FY05 LDRD Fianl Report Investigation of AAA+ protein machines that participate in DNA replication, recombination, and in response to DNA damage LDRD Project Tracking Code: 04-LW-049

    SciTech Connect

    Sawicka, D; de Carvalho-Kavanagh, M S; Barsky, D; Venclovas, C

    2006-12-04

    The AAA+ proteins are remarkable macromolecules that are able to self-assemble into nanoscale machines. These protein machines play critical roles in many cellular processes, including the processes that manage a cell's genetic material, but the mechanism at the molecular level has remained elusive. We applied computational molecular modeling, combined with advanced sequence analysis and available biochemical and genetic data, to structurally characterize eukaryotic AAA+ proteins and the protein machines they form. With these models we have examined intermolecular interactions in three-dimensions (3D), including both interactions between the components of the AAA+ complexes and the interactions of these protein machines with their partners. These computational studies have provided new insights into the molecular structure and the mechanism of action for AAA+ protein machines, thereby facilitating a deeper understanding of processes involved in DNA metabolism.

  19. Genetic fuzzy classifier for sleep stage identification.

    PubMed

    Jo, Han G; Park, Jin Y; Lee, Chung K; An, Suk K; Yoo, Sun K

    2010-07-01

    Soft-computing techniques are commonly used to detect medical phenomena and help with clinical diagnoses and treatment. In this work, we propose a design for a computerized sleep scoring method, which is based on a fuzzy classifier and a genetic algorithm (GA). We design the fuzzy classifier based on the GA using a single electroencephalogram (EEG) signal that detects differences in spectral features. Polysomnography was performed on four healthy young adults (males with a mean age of 27.5 years). The sleep classifier was designed using a sleep record and tested on the sleep records of the subjects. Our results show that the genetic fuzzy classifier (GFC) agreed with visual sleep staging approximately 84.6% of the time in detection of wakefulness (WA), shallow sleep (SS), deep sleep (DS), and rapid eye movement (REM) stages.

  20. How Is Acute Lymphocytic Leukemia Classified?

    MedlinePlus

    ... Adults Early Detection, Diagnosis, and Types How Is Acute Lymphocytic Leukemia Classified? Most types of cancers are assigned numbered ... ALL are now named as follows: B-cell ALL Early pre-B ALL (also called pro-B ...

  1. Robust C-Loss Kernel Classifiers.

    PubMed

    Xu, Guibiao; Hu, Bao-Gang; Principe, Jose C

    2016-12-29

    The correntropy-induced loss (C-loss) function has the nice property of being robust to outliers. In this paper, we study the C-loss kernel classifier with the Tikhonov regularization term, which is used to avoid overfitting. After using the half-quadratic optimization algorithm, which converges much faster than the gradient optimization algorithm, we find out that the resulting C-loss kernel classifier is equivalent to an iterative weighted least square support vector machine (LS-SVM). This relationship helps explain the robustness of iterative weighted LS-SVM from the correntropy and density estimation perspectives. On the large-scale data sets which have low-rank Gram matrices, we suggest to use incomplete Cholesky decomposition to speed up the training process. Moreover, we use the representer theorem to improve the sparseness of the resulting C-loss kernel classifier. Experimental results confirm that our methods are more robust to outliers than the existing common classifiers.

  2. 14 CFR 1216.310 - Classified actions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... actions. (a) Classification does not relieve NASA of the requirement to assess, document, and consider the environmental impacts of a proposed action. (b) When classified information can reasonably be separated...

  3. Cascaded multiple classifiers for secondary structure prediction.

    PubMed Central

    Ouali, M.; King, R. D.

    2000-01-01

    We describe a new classifier for protein secondary structure prediction that is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack-knife procedure) on a new nonredundant dataset of 496 nonhomologous sequences (obtained from G.J. Barton and J.A. Cuff). This database was especially designed to train and test protein secondary structure prediction methods, and it uses a more stringent definition of homologous sequence than in previous studies. We show that it is possible to design classifiers that can highly discriminate the three classes (H, E, C) with an accuracy of up to 78% for beta-strands, using only a local window and resampling techniques. This indicates that the importance of long-range interactions for the prediction of beta-strands has been probably previously overestimated. PMID:10892809

  4. Adaptive Bayes classifiers for remotely sensed data

    NASA Technical Reports Server (NTRS)

    Raulston, H. S.; Pace, M. O.; Gonzalez, R. C.

    1975-01-01

    An algorithm is developed for a learning, adaptive, statistical pattern classifier for remotely sensed data. The estimation procedure consists of two steps: (1) an optimal stochastic approximation of the parameters of interest, and (2) a projection of the parameters in time and space. The results reported are for Gaussian data in which the mean vector of each class may vary with time or position after the classifier is trained.

  5. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

    PubMed

    Hua, Wei; Wang, Jiasong; Zhao, Jian

    2014-01-01

    Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method.

  6. The Rocchio classifier and second generation wavelets

    NASA Astrophysics Data System (ADS)

    Carter, Patricia H.

    2007-04-01

    Classification and characterization of text is of ever growing importance in defense and national security. The text classification task is an instance of classification using sparse features residing in a high dimensional feature space. Two standard (of a wide selection of available) algorithms for this task are the naive Bayes classifier and the Rocchio linear classifier. Naive Bayes classifiers are widely applied; the Rocchio algorithm is primarily used in document classification and information retrieval. Both these classifiers are popular because of their simplicity and ease of application, computational speed and reasonable performance. One aspect of the Rocchio approach, inherited from its information retrieval origin, is that it explicitly uses both positive and negative models. Parameters have been introduced which make it adaptive to the particulars of the corpora of interest and thereby improve its performance. The ideas inherent in these classifiers and in second generation wavelets can be recombined into new algorithms for classification. An example is a classifier using second generation wavelet-like functions for class probes that mimic the Rocchio positive template - negative template approach.

  7. matK-QR classifier: a patterns based approach for plant species identification.

    PubMed

    More, Ravi Prabhakar; Mane, Rupali Chandrashekhar; Purohit, Hemant J

    2016-01-01

    DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. Due to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier

  8. Less-Complex Method of Classifying MPSK

    NASA Technical Reports Server (NTRS)

    Hamkins, Jon

    2006-01-01

    An alternative to an optimal method of automated classification of signals modulated with M-ary phase-shift-keying (M-ary PSK or MPSK) has been derived. The alternative method is approximate, but it offers nearly optimal performance and entails much less complexity, which translates to much less computation time. Modulation classification is becoming increasingly important in radio-communication systems that utilize multiple data modulation schemes and include software-defined or software-controlled receivers. Such a receiver may "know" little a priori about an incoming signal but may be required to correctly classify its data rate, modulation type, and forward error-correction code before properly configuring itself to acquire and track the symbol timing, carrier frequency, and phase, and ultimately produce decoded bits. Modulation classification has long been an important component of military interception of initially unknown radio signals transmitted by adversaries. Modulation classification may also be useful for enabling cellular telephones to automatically recognize different signal types and configure themselves accordingly. The concept of modulation classification as outlined in the preceding paragraph is quite general. However, at the present early stage of development, and for the purpose of describing the present alternative method, the term "modulation classification" or simply "classification" signifies, more specifically, a distinction between M-ary and M'-ary PSK, where M and M' represent two different integer multiples of 2. Both the prior optimal method and the present alternative method require the acquisition of magnitude and phase values of a number (N) of consecutive baseband samples of the incoming signal + noise. The prior optimal method is based on a maximum- likelihood (ML) classification rule that requires a calculation of likelihood functions for the M and M' hypotheses: Each likelihood function is an integral, over a full cycle of

  9. The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier.

    PubMed

    Veenman, Cor J; Reinders, Marcel J T

    2005-09-01

    We present the Nearest Subclass Classifier (NSC), which is a classification algorithm that unifies the flexibility of the nearest neighbor classifier with the robustness of the nearest mean classifier. The algorithm is based on the Maximum Variance Cluster algorithm and, as such, it belongs to the class of prototype-based classifiers. The variance constraint parameter of the cluster algorithm serves to regularize the classifier, that is, to prevent overfitting. With a low variance constraint value, the classifier turns into the nearest neighbor classifier and, with a high variance parameter, it becomes the nearest mean classifier with the respective properties. In other words, the number of prototypes ranges from the whole training set to only one per class. In the experiments, we compared the NSC with regard to its performance and data set compression ratio to several other prototype-based methods. On several data sets, the NSC performed similarly to the k-nearest neighbor classifier, which is a well-established classifier in many domains. Also concerning storage requirements and classification speed, the NSC has favorable properties, so it gives a good compromise between classification performance and efficiency.

  10. A model of the cell nucleus for DNA damage calculations.

    PubMed

    Nikjoo, Hooshang; Girard, Peter

    2012-01-01

    Development of a computer model of genomic deoxyribonucleic acid (DNA) in the human cell nucleus for DNA damage and repair calculations. The model comprises the human genomic DNA, chromosomal domains, and loops attached to factories. A model of canonical B-DNA was used to build the nucleosomes and the 30-nanometer solenoidal chromatin. In turn the chromatin was used to form the loops of factories in chromosome domains. The entire human genome was placed in a spherical nucleus of 10 micrometers diameter. To test the new target model, tracks of protons and alpha-particles were generated using Monte Carlo track structure codes PITS99 (Positive Ion Track Structure) and KURBUC. Damage sites induced in the genome were located and classified according to type and complexity. The three-dimensional structure of the genome starting with a canonical B-DNA model, nucleosomes, and chromatin loops in chromosomal domains are presented. The model was used to obtain frequencies of DNA damage induced by protons and alpha-particles by direct energy deposition, including single- and double-strand breaks, base damage, and clustered lesions. This three-dimensional model of the genome is the first such model using the full human genome for the next generation of more comprehensive modelling of DNA damage and repair. The model combines simple geometrical structures at the level of domains and factories with potentially full detail at the level of atoms in particular genes, allowing damage patterns in the latter to be simulated.

  11. DNA rearrangements located over 100 kb 5' of the Steel (Sl)-coding region in Steel-panda and Steel-contrasted mice deregulate Sl expression and cause female sterility by disrupting ovarian follicle development.

    PubMed

    Bedell, M A; Brannan, C I; Evans, E P; Copeland, N G; Jenkins, N A; Donovan, P J

    1995-02-15

    The Steel (Sl) locus is essential for the development of germ cells, hematopoietic cells, and melanocytes and encodes a growth factor (Mgf) that is the ligand for c-kit, a receptor tyrosine kinase encoded by the W locus. We have identified the molecular and germ cell defects in two mutant Sl alleles, Steel-panda (Slpan) and Steel-contrasted (Slcon), that cause sterility only in females. Unexpectedly, both mutant alleles are shown to contain DNA rearrangements, located > 100 kb 5' of Mgf-coding sequences, that lead to tissue-specific effects on Mgf mRNA expression. In Slpan embryos, decreased Mgf mRNA expression in the gonads causes a reduced number of primordial germ cells in both sexes. However, Mgf expression and spermatogenesis in the postnatal mutant tests is normal, and spermatogonial proliferation compensates for deficiencies in germ cell numbers. In Slpan and Slcon homozygous females, decreased Mgf mRNA expression causes sterility by affecting the initiation and maintenance of ovarian follicle development. Thus, regulated expression of Mgf is required for multiple stages of embryonic and postnatal germ cell development. Surprisingly, other areas of the Slcon female reproductive tract displayed ectopic expression of Mgf mRNA. We propose that the Slpan and Slcon rearrangements alter Mgf mRNA abundance through position effects on expression that act at a distance from the Sl gene.

  12. Rudolph Focke and the Theory of the Classified Catalog. Occasional Paper No. 145.

    ERIC Educational Resources Information Center

    Stevenson, Gordon

    Between 1900 and 1905, Rudolph Focke published a series of papers on classification theory and a draft of a code for the construction of classified catalogs. His work was the direct result of the reform of librarianship during the last decades of the nineteenth century. The large number of classification systems used by German university and…

  13. Pairwise Classifier Ensemble with Adaptive Sub-Classifiers for fMRI Pattern Analysis.

    PubMed

    Kim, Eunwoo; Park, HyunWook

    2017-02-01

    The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-classifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classifiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.

  14. Reinforcement learning based artificial immune classifier.

    PubMed

    Karakose, Mehmet

    2013-01-01

    One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  15. Ranked Multi-Label Rules Associative Classifier

    NASA Astrophysics Data System (ADS)

    Thabtah, Fadi

    Associative classification is a promising approach in data mining, which integrates association rule discovery and classification. In this paper, we present a novel associative classification technique called Ranked Multilabel Rule (RMR) that derives rules with multiple class labels. Rules derived by current associative classification algorithms overlap in their training data records, resulting in many redundant and useless rules. However, RMR removes the overlapping between rules using a pruning heuristic and ensures that rules in the final classifier do not share training records, resulting in more accurate classifiers. Experimental results obtained on twenty data sets show that the classifiers produced by RMR are highly competitive if compared with those generated by decision trees and other popular associative techniques such as CBA, with respect to prediction accuracy.

  16. Role of classifiers in multimedia content management

    NASA Astrophysics Data System (ADS)

    Naphade, Milind R.; Smith, John R.

    2003-01-01

    Enabling semantic detection and indexing is an important task in multimedia content management. Learning and classification techniques are increasingly relevant to the state of the art content management systems. From relevance feedback to semantic detection, there is a shift in the amount of supervision that precedes retrieval from light weight classifiers to heavy weight classifiers. In this paper we compare the performance of some popular classifiers for semantic video indexing. We mainly compare among other techniques, one technique for generative modeling and one for discriminant learning and show how they behave depending on the number of examples that the user is willing to provide to the system. We report results using the NIST TREC Video Corpus.

  17. Can we classify medical data dictionaries?

    PubMed

    Bürkle, T

    2000-01-01

    Medical Data Dictionaries enable a clinical information system to maintain a controlled vocabulary, to store descriptive knowledge about terms, to map between those terms and from those terms to external classifications. They support a variety of functions in the information system, ranging from structured documentation to knowledgebased functions. This paper derives a multi-axial classification for medical data dictionaries. Dictionaries are classified along 4 axes, a vocabulary axis defining vocabulary properties, an application axis which characterises the degree of linkage between dictionary and information system, a semantic axis defining the quality of inter-term relationships and finally a language axis which classifies rules for inter-term relationships in semiotic theory. As an example two existing dictionaries are classified in the model and reference is taken to the design of future dictionaries.

  18. On Asymmetric Classifier Training for Detector Cascades

    SciTech Connect

    Gee, Timothy Felix

    2006-01-01

    This paper examines the Asymmetric AdaBoost algorithm introduced by Viola and Jones for cascaded face detection. The Viola and Jones face detector uses cascaded classifiers to successively filter, or reject, non-faces. In this approach most non-faces are easily rejected by the earlier classifiers in the cascade, thus reducing the overall number of computations. This requires earlier cascade classifiers to very seldomly reject true instances of faces. To reflect this training goal, Viola and Jones introduce a weighting parameter for AdaBoost iterations and show it enforces a desirable bound. During their implementation, a modification to the proposed weighting was introduced, while enforcing the same bound. The goal of this paper is to examine their asymmetric weighting by putting AdaBoost in the form of Additive Regression as was done by Friedman, Hastie, and Tibshirani. The author believes this helps to explain the approach and adds another connection between AdaBoost and Additive Regression.

  19. Classifier Fusion With Contextual Reliability Evaluation.

    PubMed

    Liu, Zhunga; Pan, Quan; Dezert, Jean; Han, Jun-Wei; He, You

    2017-06-08

    Classifier fusion is an efficient strategy to improve the classification performance for the complex pattern recognition problem. In practice, the multiple classifiers to combine can have different reliabilities and the proper reliability evaluation plays an important role in the fusion process for getting the best classification performance. We propose a new method for classifier fusion with contextual reliability evaluation (CF-CRE) based on inner reliability and relative reliability concepts. The inner reliability, represented by a matrix, characterizes the probability of the object belonging to one class when it is classified to another class. The elements of this matrix are estimated from the $K$-nearest neighbors of the object. A cautious discounting rule is developed under belief functions framework to revise the classification result according to the inner reliability. The relative reliability is evaluated based on a new incompatibility measure which allows to reduce the level of conflict between the classifiers by applying the classical evidence discounting rule to each classifier before their combination. The inner reliability and relative reliability capture different aspects of the classification reliability. The discounted classification results are combined with Dempster-Shafer's rule for the final class decision making support. The performance of CF-CRE have been evaluated and compared with those of main classical fusion methods using real data sets. The experimental results show that CF-CRE can produce substantially higher accuracy than other fusion methods in general. Moreover, CF-CRE is robust to the changes of the number of nearest neighbors chosen for estimating the reliability matrix, which is appealing for the applications.

  20. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  1. Use of robust estimators in parametric classifiers

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David A.

    1989-01-01

    The parametric approach to density estimation and classifier design is a well studied subject. The parametric approach is desirable because basically it reduces the problem of classifier design to that of estimating a few parameters for each of the pattern classes. The class parameters are usually estimated using maximum-likelihood (ML) estimators. ML estimators are, however, very sensitive to the presence of outliers. Several robust estimators of mean and covariance matrix and their effect on the probability of error in classification are examined. Comments are made about alpha-ranked (alpha-trimmed) estimators.

  2. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  3. Use of robust estimators in parametric classifiers

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David A.

    1989-01-01

    The parametric approach to density estimation and classifier design is a well studied subject. The parametric approach is desirable because basically it reduces the problem of classifier design to that of estimating a few parameters for each of the pattern classes. The class parameters are usually estimated using maximum-likelihood (ML) estimators. ML estimators are, however, very sensitive to the presence of outliers. Several robust estimators of mean and covariance matrix and their effect on the probability of error in classification are examined. Comments are made about alpha-ranked (alpha-trimmed) estimators.

  4. Maximal dinucleotide comma-free codes.

    PubMed

    Fimmel, Elena; Strüngmann, Lutz

    2016-01-21

    The problem of retrieval and maintenance of the correct reading frame plays a significant role in RNA transcription. Circular codes, and especially comma-free codes, can help to understand the underlying mechanisms of error-detection in this process. In recent years much attention has been paid to the investigation of trinucleotide circular codes (see, for instance, Fimmel et al., 2014; Fimmel and Strüngmann, 2015a; Michel and Pirillo, 2012; Michel et al., 2012, 2008), while dinucleotide codes had been touched on only marginally, even though dinucleotides are associated to important biological functions. Recently, all maximal dinucleotide circular codes were classified (Fimmel et al., 2015; Michel and Pirillo, 2013). The present paper studies maximal dinucleotide comma-free codes and their close connection to maximal dinucleotide circular codes. We give a construction principle for such codes and provide a graphical representation that allows them to be visualized geometrically. Moreover, we compare the results for dinucleotide codes with the corresponding situation for trinucleotide maximal self-complementary C(3)-codes. Finally, the results obtained are discussed with respect to Crick׳s hypothesis about frame-shift-detecting codes without commas.

  5. DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.

    PubMed

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.

  6. DNA Fingerprinting of Chinese Melon Provides Evidentiary Support of Seed Quality Appraisal

    PubMed Central

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications. PMID:23285039

  7. What Advances Are Being Made in DNA Sequencing?

    MedlinePlus

    ... of DNA building blocks (nucleotides) in an individual's genetic code, called DNA sequencing, has advanced the study of ... a breakthrough that helped scientists determine the human genetic code, but it is time-consuming and expensive. The ...

  8. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  9. 32 CFR 148.2 - Classified programs.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 1 2012-07-01 2012-07-01 false Classified programs. 148.2 Section 148.2 National Defense Department of Defense OFFICE OF THE SECRETARY OF DEFENSE PERSONNEL, MILITARY AND CIVILIAN NATIONAL POLICY AND IMPLEMENTATION OF RECIPROCITY OF FACILITIES National Policy on Reciprocity of Use and...

  10. 32 CFR 148.2 - Classified programs.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 1 2014-07-01 2014-07-01 false Classified programs. 148.2 Section 148.2 National Defense Department of Defense OFFICE OF THE SECRETARY OF DEFENSE PERSONNEL, MILITARY AND CIVILIAN NATIONAL POLICY AND IMPLEMENTATION OF RECIPROCITY OF FACILITIES National Policy on Reciprocity of Use and...

  11. Large margin classifier-based ensemble tracking

    NASA Astrophysics Data System (ADS)

    Wang, Yuru; Liu, Qiaoyuan; Yin, Minghao; Wang, ShengSheng

    2016-07-01

    In recent years, many studies consider visual tracking as a two-class classification problem. The key problem is to construct a classifier with sufficient accuracy in distinguishing the target from its background and sufficient generalize ability in handling new frames. However, the variable tracking conditions challenges the existing methods. The difficulty mainly comes from the confused boundary between the foreground and background. This paper handles this difficulty by generalizing the classifier's learning step. By introducing the distribution data of samples, the classifier learns more essential characteristics in discriminating the two classes. Specifically, the samples are represented in a multiscale visual model. For features with different scales, several large margin distribution machine (LDMs) with adaptive kernels are combined in a Baysian way as a strong classifier. Where, in order to improve the accuracy and generalization ability, not only the margin distance but also the sample distribution is optimized in the learning step. Comprehensive experiments are performed on several challenging video sequences, through parameter analysis and field comparison, the proposed LDM combined ensemble tracker is demonstrated to perform with sufficient accuracy and generalize ability in handling various typical tracking difficulties.

  12. The Front Line: Satisfaction of Classified Employees.

    ERIC Educational Resources Information Center

    Bauer, Karen W.

    2000-01-01

    Discusses job satisfaction in classified support staff (primarily clerical and secretarial) of colleges and universities. Notes that these staff are frequently the first representatives of the institution encountered by prospective students, parents, and others. Finds that rewards and recognition, opportunities for feedback, and help with…

  13. Performance Evaluation of a Semantic Perception Classifier

    DTIC Science & Technology

    2013-09-01

    Performance Evaluation of a Semantic Perception Classifier by Craig Lennon, Barry Bodt, Marshal Childers, Rick Camden, Arne Suppe, Luis...Camden and Nicoleta Florea Engility Corporation Luis Navarro-Serment and Arne Suppe Carnegie Mellon University...Lennon, Barry Bodt, Marshal Childers, Rick Camden,* Arne Suppe, † Luis Navarro-Serment, † and Nicoleta Florea* 5d. PROJECT NUMBER 5e. TASK

  14. Shape and Function in Hmong Classifier Choices

    ERIC Educational Resources Information Center

    Sakuragi, Toshiyuki; Fuller, Judith W.

    2013-01-01

    This study examined classifiers in the Hmong language with a particular focus on gaining insights into the underlying cognitive process of categorization. Forty-three Hmong speakers participated in three experiments. In the first experiment, designed to verify the previously postulated configurational (saliently one-dimensional, saliently…

  15. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ..., DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and Declassification of National Security Information § 1312.4 Classified designations. (a) Except as provided by the Atomic Energy Act of 1954, as amended, (42 U.S.C. 2011) or the National Security Act of 1947, as...

  16. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ..., DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and Declassification of National Security Information § 1312.4 Classified designations. (a) Except as provided by the Atomic Energy Act of 1954, as amended, (42 U.S.C. 2011) or the National Security Act of 1947, as...

  17. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ..., DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and Declassification of National Security Information § 1312.4 Classified designations. (a) Except as provided by the Atomic Energy Act of 1954, as amended, (42 U.S.C. 2011) or the National Security Act of 1947, as...

  18. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ..., DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and Declassification of National Security Information § 1312.4 Classified designations. (a) Except as provided by the Atomic Energy Act of 1954, as amended, (42 U.S.C. 2011) or the National Security Act of 1947, as...

  19. Learning for VMM + WTA Embedded Classifiers

    DTIC Science & Technology

    2016-03-31

    training, less than 30μW of operational power and lower with additional fabrication. Keywords: Embedded Machine Learning ICs, Neuromorphic... Learning for VMM + WTA Embedded Classifiers Jennifer Hasler and Sahil Shah Electrical and Computer Engineering Georgia Institute of Technology...infinite resources). Foundations of VMM+WTA Learning The fundamental question is enabling a working supervised learning technique for these systems

  20. The Community; A Classified, Annotated Bibliography.

    ERIC Educational Resources Information Center

    Payne, Raymond, Comp.; Bailey, Wilfrid C., Comp.

    This is a classified retrospective bibliography of 839 items on the community (about 140 are annotated) from rural sociology and agricultural economics departments and sections, agricultural experiment stations, extension services, and related agencies. Items are categorized as follows: bibliography and reference lists; location and delineation of…

  1. Visual Classifier Training for Text Document Retrieval.

    PubMed

    Heimerl, F; Koch, S; Bosch, H; Ertl, T

    2012-12-01

    Performing exhaustive searches over a large number of text documents can be tedious, since it is very hard to formulate search queries or define filter criteria that capture an analyst's information need adequately. Classification through machine learning has the potential to improve search and filter tasks encompassing either complex or very specific information needs, individually. Unfortunately, analysts who are knowledgeable in their field are typically not machine learning specialists. Most classification methods, however, require a certain expertise regarding their parametrization to achieve good results. Supervised machine learning algorithms, in contrast, rely on labeled data, which can be provided by analysts. However, the effort for labeling can be very high, which shifts the problem from composing complex queries or defining accurate filters to another laborious task, in addition to the need for judging the trained classifier's quality. We therefore compare three approaches for interactive classifier training in a user study. All of the approaches are potential candidates for the integration into a larger retrieval system. They incorporate active learning to various degrees in order to reduce the labeling effort as well as to increase effectiveness. Two of them encompass interactive visualization for letting users explore the status of the classifier in context of the labeled documents, as well as for judging the quality of the classifier in iterative feedback loops. We see our work as a step towards introducing user controlled classification methods in addition to text search and filtering for increasing recall in analytics scenarios involving large corpora.

  2. Classifying and quantifying basins of attraction

    SciTech Connect

    Sprott, J. C.; Xiong, Anda

    2015-08-15

    A scheme is proposed to classify the basins for attractors of dynamical systems in arbitrary dimensions. There are four basic classes depending on their size and extent, and each class can be further quantified to facilitate comparisons. The calculation uses a Monte Carlo method and is applied to numerous common dissipative chaotic maps and flows in various dimensions.

  3. Shape and Function in Hmong Classifier Choices

    ERIC Educational Resources Information Center

    Sakuragi, Toshiyuki; Fuller, Judith W.

    2013-01-01

    This study examined classifiers in the Hmong language with a particular focus on gaining insights into the underlying cognitive process of categorization. Forty-three Hmong speakers participated in three experiments. In the first experiment, designed to verify the previously postulated configurational (saliently one-dimensional, saliently…

  4. Performance of a 20-target MSE classifier

    NASA Astrophysics Data System (ADS)

    Novak, Leslie M.; Owirka, Gregory J.; Brower, William S.

    1998-08-01

    MIT Lincoln Laboratory is responsible for developing the ATR system for the DARPA/DARO/NIMA/OSD-sponsored SAIP program; the baseline ATR system recognizes 10 GOB targets; the enhanced version of SAIP requires the ATR system to recognize 20 GOB targets. This paper compares ATR performance results for 10- and 20-target MSE classifiers using high-resolution SAR imagery.

  5. Dynamic classifiers improve pulverizer performance and more

    SciTech Connect

    Sommerlad, R.E.; Dugdale, K.L.

    2007-07-15

    Keeping coal-fired steam plants running efficiently and cleanly is a daily struggle. An article in the February 2007 issue of Power explained that one way to improve the combustion and emissions performance of a plant is to optimize the performance of its coal pulverizers. By adding a dynamic classifier to the pulverizers, you can better control coal particle sizing and fineness, and increase pulverizer capacity to boot. A dynamic classifier has an inner rotating cage and outer stationary vanes which, acting in concert, provide centrifugal or impinging classification. Replacing or upgrading a pulverizer's classifier from static to dynamic improves grinding performance reducing the level of unburned carbon in the coal in the process. The article describes the project at E.ON's Ratcliffe-on-Soar Power station in the UK to retrofit Loesche LSKS dynamic classifiers. It also mentions other successful projects at Scholven Power Station in Germany, Tilbury Power Station in the UK and J.B. Sims Power Plant in Michigan, USA. 8 figs.

  6. 32 CFR 148.2 - Classified programs.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 1 2011-07-01 2011-07-01 false Classified programs. 148.2 Section 148.2 National Defense Department of Defense OFFICE OF THE SECRETARY OF DEFENSE PERSONNEL, MILITARY AND CIVILIAN NATIONAL POLICY AND IMPLEMENTATION OF RECIPROCITY OF FACILITIES National Policy on Reciprocity of Use...

  7. 32 CFR 148.2 - Classified programs.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 1 2010-07-01 2010-07-01 false Classified programs. 148.2 Section 148.2 National Defense Department of Defense OFFICE OF THE SECRETARY OF DEFENSE PERSONNEL, MILITARY AND CIVILIAN NATIONAL POLICY AND IMPLEMENTATION OF RECIPROCITY OF FACILITIES National Policy on Reciprocity of Use...

  8. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of the...

  9. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of the...

  10. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of the...

  11. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  12. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  13. The classifier problem in Chinese aphasia.

    PubMed

    Tzeng, O J; Chen, S; Hung, D L

    1991-08-01

    In recent years, research on the relationship between brain organization and language processing has benefited tremendously from cross-linguistic comparisons of language disorders among different types of aphasic patients. Results from these cross-linguistic studies have shown that the same aphasic syndromes often look very different from one language to another, suggesting that language-specific knowledge is largely preserved in Broca's and Wernicke's aphasics. In this paper, Chinese aphasic patients were examined with respect to their (in)ability to use classifiers in a noun phrase. The Chinese language, in addition to its lack of verb conjugation and an absence of noun declension, is exceptional in yet another respect: articles, numerals, and other such modifiers cannot directly precede their associated nouns, there has to be an intervening morpheme called a classifier. The appropriate usage of nominal classifiers is considered to be one of the most difficult aspects of Chinese grammar. Our examination of Chinese aphasic patients revealed two essential points. First, Chinese aphasic patients experience difficulty in the production of nominal classifiers, committing a significant number of errors of omission and/or substitution. Second, two different kinds of substitution errors are observed in Broca's and Wernicke's patients, and the detailed analysis of the difference demands a rethinking of the distinction between agrammatism and paragrammatism. The result adds to a growing body of evidence suggesting that grammar is impaired in fluent as well as nonfluent aphasia.

  14. Error-correction coding

    NASA Technical Reports Server (NTRS)

    Hinds, Erold W. (Principal Investigator)

    1996-01-01

    This report describes the progress made towards the completion of a specific task on error-correcting coding. The proposed research consisted of investigating the use of modulation block codes as the inner code of a concatenated coding system in order to improve the overall space link communications performance. The study proposed to identify and analyze candidate codes that will complement the performance of the overall coding system which uses the interleaved RS (255,223) code as the outer code.

  15. Evolving Coevolutionary Classifiers Under Large Attribute Spaces

    NASA Astrophysics Data System (ADS)

    Doucette, John; Lichodzijewski, Peter; Heywood, Malcolm

    Model-building under the supervised learning domain potentially face a dual learning problem of identifying both the parameters of the model and the subset of (domain) attributes necessary to support the model, thus using an embedded as opposed to wrapper or filter based design. Genetic Programming (GP) has always addressed this dual problem, however, further implicit assumptions are made which potentially increase the complexity of the resulting solutions. In this work we are specifically interested in the case of classification under very large attribute spaces. As such it might be expected that multiple independent/ overlapping attribute subspaces support the mapping to class labels; whereas GP approaches to classification generally assume a single binary classifier per class, forcing the model to provide a solution in terms of a single attribute subspace and single mapping to class labels. Supporting the more general goal is considered as a requirement for identifying a 'team' of classifiers with non-overlapping classifier behaviors, in which each classifier responds to different subsets of exemplars. Moreover, the subsets of attributes associated with each team member might utilize a unique 'subspace' of attributes. This work investigates the utility of coevolutionary model building for the case of classification problems with attribute vectors consisting of 650 to 100,000 dimensions. The resulting team based coevolutionary evolutionary method-Symbiotic Bid-based (SBB) GP-is compared to alternative embedded classifier approaches of C4.5 and Maximum Entropy Classification (MaxEnt). SSB solutions demonstrate up to an order of magnitude lower attribute count relative to C4.5 and up to two orders of magnitude lower attribute count than MaxEnt while retaining comparable or better classification performance. Moreover, relative to the attribute count of individual models participating within a team, no more than six attributes are ever utilized; adding a further

  16. Bayes Error Rate Estimation Using Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2003-01-01

    The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.

  17. Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

    Treesearch

    Daniel L. Schmoldt; Jing He; A. Lynn Abbott

    1998-01-01

    Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...

  18. 22 CFR 125.3 - Exports of classified technical data and classified defense articles.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Exports of classified technical data and classified defense articles. 125.3 Section 125.3 Foreign Relations DEPARTMENT OF STATE INTERNATIONAL TRAFFIC... in the Department of Defense National Industrial Security Program Operating Manual (unless such...

  19. Evolutionary design of a fuzzy classifier from data.

    PubMed

    Chang, Xiaoguang; Lilly, John H

    2004-08-01

    Genetic algorithms show powerful capabilities for automatically designing fuzzy systems from data, but many proposed methods must be subjected to some minimal structure assumptions, such as rule base size. In this paper, we also address the design of fuzzy systems from data. A new evolutionary approach is proposed for deriving a compact fuzzy classification system directly from data without any a priori knowledge or assumptions on the distribution of the data. At the beginning of the algorithm, the fuzzy classifier is empty with no rules in the rule base and no membership functions assigned to fuzzy variables. Then, rules and membership functions are automatically created and optimized in an evolutionary process. To accomplish this, parameters of the variable input spread inference training (VISIT) algorithm are used to code fuzzy systems on the training data set. Therefore, we can derive each individual fuzzy system via the VISIT algorithm, and then search the best one via genetic operations. To evaluate the fuzzy classifier, a fuzzy expert system acts as the fitness function. This fuzzy expert system can effectively evaluate the accuracy and compactness at the same time. In the application section, we consider four benchmark classification problems: the iris data, wine data, Wisconsin breast cancer data, and Pima Indian diabetes data. Comparisons of our method with others in the literature show the effectiveness of the proposed method.

  20. Perfect teleportation and superdense coding with W states

    SciTech Connect

    Agrawal, Pankaj; Pati, Arun

    2006-12-15

    True tripartite entanglement of the state of a system of three qubits can be classified on the basis of stochastic local operations and classical communications. Such states can be classified into two categories: GHZ states and W states. It is known that GHZ states can be used for teleportation and superdense coding, but the prototype W state cannot be. However, we show that there is a class of W states that can be used for perfect teleportation and superdense coding.

  1. Transcriptome-based functional classifiers for direct immunotoxicity.

    PubMed

    Shao, Jia; Berger, Laura F; Hendriksen, Peter J M; Peijnenburg, Ad A C M; van Loveren, Henk; Volger, Oscar L

    2014-03-01

    Current screening methods for direct immunotoxic chemicals are mainly based on general toxicity studies with rodents. The present study aimed to identify transcriptome-based functional classifiers that can eventually be exploited for the development of in vitro screening assays for direct immunotoxicity. To this end, a toxicogenomics approach was applied in which gene expression changes in human Jurkat lymphoblastic T cells were investigated in response to a wide range of compounds, including direct immunotoxicants, immunosuppressive drugs, and non-immunotoxic control chemicals. On the basis of DNA microarray data previously obtained by the exposure of Jurkat cells to 31 test compounds (Shao et al. in Toxicol Sci 135(2):328-346, 2013), we identified a set of 93 genes, of which 80 were significantly regulated (|numerical ratio| ≥1.62) by at least three compounds and the other 13 genes were significantly regulated by either one single compound or compound class. A total of 28 most differentially regulated genes were selected for qRT-PCR verification using a training set of 44 compounds consisting of the above-mentioned 31 compounds (23 immunotoxic and 8 non-immunotoxic) and 13 additional immunotoxicants. Good correlation between the results of microarray and qRT-PCR (Pearson's correlation, R ≥ 0.69) was found for 27 out of the 28 genes. Redundancy analysis of these 27 potential classifiers led to a final set of 25 genes. To assess the performance of these genes, Jurkat cells were exposed to 20 additional compounds (external verification set) followed by qRT-PCR. The classifier set of 25 genes gave a good performance in the external verification: accuracy 85 %, true positive rate (sensitivity) 88 %, and true negative rate (specificity) 67 %. Furthermore, on the basis of the gene ontology annotation of the 25 classifier genes, the immunotoxicants examined in this study could be categorized into distinct functional subclasses. In conclusion, we have identified and

  2. Chilean Pitavia more closely related to Oceania and Old World Rutaceae than to Neotropical groups: evidence from two cpDNA non-coding regions, with a new subfamilial classification of the family

    PubMed Central

    Groppo, Milton; Kallunki, Jacquelyn A.; Pirani, José Rubens; Antonelli, Alexandre

    2012-01-01

    Abstract The position of the plant genus Pitavia within an infrafamilial phylogeny of Rutaceae (rue, or orange family) was investigated with the use of two non-coding regions from cpDNA, the trnL-trnF region and the rps16 intron. The only species of the genus, Pitavia punctata Molina, is restricted to the temperate forests of the Coastal Cordillera of Central-Southern Chile and threatened by loss of habitat. The genus traditionally has been treated as part of tribe Zanthoxyleae (subfamily Rutoideae) where it constitutes the monogeneric tribe Pitaviinae. This tribe and genus are characterized by fruits of 1 to 4 fleshy drupelets, unlike the dehiscent fruits typical of the subfamily. Fifty-five taxa of Rutaceae, representing 53 genera (nearly one-third of those in the family) and all subfamilies, tribes, and almost all subtribes of the family were included. Parsimony and Bayesian inference were used to infer the phylogeny; six taxa of Meliaceae, Sapindaceae, and Simaroubaceae, all members of Sapindales, were also used as out-groups. Results from both analyses were congruent and showed Pitavia as sister to Flindersia and Lunasia, both genera with species scattered through Australia, Philippines, Moluccas, New Guinea and the Malayan region, and phylogenetically far from other Neotropical Rutaceae, such as the Galipeinae (Galipeeae, Rutoideae) and Pteleinae (Toddalieae, former Toddalioideae). Additionally, a new circumscription of the subfamilies of Rutaceae is presented and discussed. Only two subfamilies (both monophyletic) are recognized: Cneoroideae (including Dictyolomatoideae, Spathelioideae, Cneoraceae, and Ptaeroxylaceae) and Rutoideae (including not only traditional Rutoideae but also Aurantioideae, Flindersioideae, and Toddalioideae). As a consequence, Aurantioideae (Citrus and allies) is reduced to tribal rank as Aurantieae. PMID:23717188

  3. TU-EF-304-10: Efficient Multiscale Simulation of the Proton Relative Biological Effectiveness (RBE) for DNA Double Strand Break (DSB) Induction and Bio-Effective Dose in the FLUKA Monte Carlo Radiation Transport Code

    SciTech Connect

    Moskvin, V; Tsiamas, P; Axente, M; Farr, J; Stewart, R

    2015-06-15

    Purpose: One of the more critical initiating events for reproductive cell death is the creation of a DNA double strand break (DSB). In this study, we present a computationally efficient way to determine spatial variations in the relative biological effectiveness (RBE) of proton therapy beams within the FLUKA Monte Carlo (MC) code. Methods: We used the independently tested Monte Carlo Damage Simulation (MCDS) developed by Stewart and colleagues (Radiat. Res. 176, 587–602 2011) to estimate the RBE for DSB induction of monoenergetic protons, tritium, deuterium, hellium-3, hellium-4 ions and delta-electrons. The dose-weighted (RBE) coefficients were incorporated into FLUKA to determine the equivalent {sup 6}°60Co γ-ray dose for representative proton beams incident on cells in an aerobic and anoxic environment. Results: We found that the proton beam RBE for DSB induction at the tip of the Bragg peak, including primary and secondary particles, is close to 1.2. Furthermore, the RBE increases laterally to the beam axis at the area of Bragg peak. At the distal edge, the RBE is in the range from 1.3–1.4 for cells irradiated under aerobic conditions and may be as large as 1.5–1.8 for cells irradiated under anoxic conditions. Across the plateau region, the recorded RBE for DSB induction is 1.02 for aerobic cells and 1.05 for cells irradiated under anoxic conditions. The contribution to total effective dose from secondary heavy ions decreases with depth and is higher at shallow depths (e.g., at the surface of the skin). Conclusion: Multiscale simulation of the RBE for DSB induction provides useful insights into spatial variations in proton RBE within pristine Bragg peaks. This methodology is potentially useful for the biological optimization of proton therapy for the treatment of cancer. The study highlights the need to incorporate spatial variations in proton RBE into proton therapy treatment plans.

  4. Semantic Features for Classifying Referring Search Terms

    SciTech Connect

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.; Bell, Eric B.; Marshall, Eric J.; Gregory, Michelle L.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from query text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.

  5. Classifying Land Cover Using Spectral Signature

    NASA Astrophysics Data System (ADS)

    Alawiye, F. S.

    2012-12-01

    Studying land cover has become increasingly important as countries try to overcome the destruction of wetlands; its impact on local climate due to seasonal variation, radiation balance, and deteriorating environmental quality. In this investigation, we have been studying the spectral signatures of the Jamaica Bay wetland area based on remotely sensed satellite input data from LANDSAT TM and ASTER. We applied various remote sensing techniques to generate classified land cover output maps. Our classifiers relied on input from both the remote sensing and in-situ spectral field data. Based upon spectral separability and data collected in the field, a supervised and unsupervised classification was carried out. First results suggest good agreement between the land cover units mapped and those observed in the field.

  6. Classification Studies in an Advanced Air Classifier

    NASA Astrophysics Data System (ADS)

    Routray, Sunita; Bhima Rao, R.

    2016-10-01

    In the present paper, experiments are carried out using VSK separator which is an advanced air classifier to recover heavy minerals from beach sand. In classification experiments the cage wheel speed and the feed rate are set and the material is fed to the air cyclone and split into fine and coarse particles which are collected in separate bags. The size distribution of each fraction was measured by sieve analysis. A model is developed to predict the performance of the air classifier. The objective of the present model is to predict the grade efficiency curve for a given set of operating parameters such as cage wheel speed and feed rate. The overall experimental data with all variables studied in this investigation is fitted to several models. It is found that the present model is fitting good to the logistic model.

  7. Comparing cosmic web classifiers using information theory

    NASA Astrophysics Data System (ADS)

    Leclercq, Florent; Lavaux, Guilhem; Jasche, Jens; Wandelt, Benjamin

    2016-08-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  8. Classifying objects in LWIR imagery via CNNs

    NASA Astrophysics Data System (ADS)

    Rodger, Iain; Connor, Barry; Robertson, Neil M.

    2016-10-01

    The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.

  9. Classifying Star Forming Cores through Chemical Anomalies

    NASA Astrophysics Data System (ADS)

    Hoq, Sadia; Jackson, J.; Foster, J.

    2011-05-01

    The chemical makeup of Infrared Dark Clouds may offer a method to classify star forming cores. This study uses the molecular line maps from the Millimetre Astronomy Legacy Team 90 GHz (MALT90) Survey, observed using the 22-m ATNF Mopra Telescope. The relative abundances of the four molecules, N2H+, HNC, HCN and HCO+ are calculated for each of 500 cores to determine the chemical signatures of star forming cores in their early evolutionary stages, as deduced from Spitzer data. Cores are classified as prestellar, protostellar, or HII regions. Initial findings indicate that sources with relatively strong N2H+ lines are prestellar, whereas weak N2H+ lines may designate protostellar or HII regions. These chemical anomalies, where the N2H+ lines are either very prominent or weak are rare, suggesting that these are short-lived chemical phases.

  10. Letter identification and the neural image classifier.

    PubMed

    Watson, Andrew B; Ahumada, Albert J

    2015-02-12

    Letter identification is an important visual task for both practical and theoretical reasons. To extend and test existing models, we have reviewed published data for contrast sensitivity for letter identification as a function of size and have also collected new data. Contrast sensitivity increases rapidly from the acuity limit but slows and asymptotes at a symbol size of about 1 degree. We recast these data in terms of contrast difference energy: the average of the squared distances between the letter images and the average letter image. In terms of sensitivity to contrast difference energy, and thus visual efficiency, there is a peak around ¼ degree, followed by a marked decline at larger sizes. These results are explained by a Neural Image Classifier model that includes optical filtering and retinal neural filtering, sampling, and noise, followed by an optimal classifier. As letters are enlarged, sensitivity declines because of the increasing size and spacing of the midget retinal ganglion cell receptive fields in the periphery.

  11. Classifying bed inclination using pressure images.

    PubMed

    Baran Pouyan, M; Ostadabbas, S; Nourani, M; Pompeo, M

    2014-01-01

    Pressure ulcer is one of the most prevalent problems for bed-bound patients in hospitals and nursing homes. Pressure ulcers are painful for patients and costly for healthcare systems. Accurate in-bed posture analysis can significantly help in preventing pressure ulcers. Specifically, bed inclination (back angle) is a factor contributing to pressure ulcer development. In this paper, an efficient methodology is proposed to classify bed inclination. Our approach uses pressure values collected from a commercial pressure mat system. Then, by applying a number of image processing and machine learning techniques, the approximate degree of bed is estimated and classified. The proposed algorithm was tested on 15 subjects with various sizes and weights. The experimental results indicate that our method predicts bed inclination in three classes with 80.3% average accuracy.

  12. 32 CFR 1633.1 - Classifying authority.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... he is eligible except Classes 1-A-0, 1-0, 2-D, 3-A, and 4-D: Provided, That, the Director may not... a registrant into any class for which he is eligible. (d) A local board may in accord with part 1648 of this chapter classify a registrant into Class 1-A-0, 1-0, 2-D, 3-A, or 4-D for which he is...

  13. Dynamic Dimensionality Selection for Bayesian Classifier Ensembles

    DTIC Science & Technology

    2015-03-19

    data. It exploited the capacity of generative learning to efficently extract useful summary statistics and used discriminative learning to meld them...into a highly accurate classifier. Two classes of learning algorithm were developed. The first uses discriminative learning to select a generative...attribute that is finally selected. The second combines generatively and discriminatively learned parameters (WANBIA, WANBIA-C,WANJE). It uses discriminative

  14. Bayes classifiers for imbalanced traffic accidents datasets.

    PubMed

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents.

  15. Double Ramp Loss Based Reject Option Classifier

    DTIC Science & Technology

    2015-05-22

    choose 10% of these points uniformly at random and flip their labels. 2. Ionosphere Dataset [2] : This dataset describes the problem of discrimi- nating...good versus bad radars based on whether they send some useful infor- mation about the Ionosphere . There are 34 variables and 351 observations. 3... Ionosphere dataset (nonlinear classifiers using RBF kernel for both the approaches) d LDR (C = 2, γ = 0.125) LDH (C = 16, γ = 0.125) Risk RR Acc(unrej

  16. Development of multi-size classifying cyclone

    SciTech Connect

    Zhan Hanhui; Wang Zuna

    1994-12-31

    The authors have developed a multi-size classifying cyclone, which is characterized by its distinctive structure and quasi forced vortex in a rotary flow region. The cyclone differs from a conventional cyclone in three-dimensional velocity distribution in a rotary flow region, but the former has the same pressure distribution law as the latter. Tests show that satisfactory multi-size classification can be achieved using the cyclone.

  17. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Information Security Program § 503.59 Safeguarding classified information. (a) All classified information... the Commission Information Security Program, particularly those concerning the classification... security; (2) Takes appropriate steps to protect classified information from unauthorized disclosure or...

  18. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... Information Security Program § 503.59 Safeguarding classified information. (a) All classified information... the Commission Information Security Program, particularly those concerning the classification... security; (2) Takes appropriate steps to protect classified information from unauthorized disclosure or...

  19. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... Information Security Program § 503.59 Safeguarding classified information. (a) All classified information... the Commission Information Security Program, particularly those concerning the classification... security; (2) Takes appropriate steps to protect classified information from unauthorized disclosure or...

  20. Evolving edited k-nearest neighbor classifiers.

    PubMed

    Gil-Pita, Roberto; Yao, Xin

    2008-12-01

    The k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. The edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method.

  1. cDNA cloning and sequence analysis of human pancreatic procarboxypeptidase A1.

    PubMed Central

    Catasús, L; Villegas, V; Pascual, R; Avilés, F X; Wicker-Planquart, C; Puigserver, A

    1992-01-01

    Using polyclonal antibodies raised against human pancreatic procarboxypeptidases, a full-length cDNA coding for an A-type proenzyme was isolated from a lambda gt11 human pancreatic library. This cDNA contains standard 3' and 5' flanking regions, a poly(A)+ tail and a central region of 1260 nucleotides coding for a protein of 419 amino acids. On the basis of sequence comparisons, the human protein was classified as a procarboxypeptidase A1 which is very similar to the previously described A1 forms from rat and bovine pancreatic glands. The presence of the amino acid sequences assumed to be of importance for the zymogen inhibition by its activation segment, primarily on the basis of the recently reported crystal structure of the B form, further supports the proposed classification. PMID:1417781

  2. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  3. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  4. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    PubMed Central

    Arshad, Sannia; Rho, Seungmin

    2014-01-01

    We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302

  5. Reconfiguration-based implementation of SVM classifier on FPGA for Classifying Microarray data.

    PubMed

    Hussain, Hanaa M; Benkrid, Khaled; Seker, Huseyin

    2013-01-01

    Classifying Microarray data, which are of high dimensional nature, requires high computational power. Support Vector Machines-based classifier (SVM) is among the most common and successful classifiers used in the analysis of Microarray data but also requires high computational power due to its complex mathematical architecture. Implementing SVM on hardware exploits the parallelism available within the algorithm kernels to accelerate the classification of Microarray data. In this work, a flexible, dynamically and partially reconfigurable implementation of the SVM classifier on Field Programmable Gate Array (FPGA) is presented. The SVM architecture achieved up to 85× speed-up over equivalent general purpose processor (GPP) showing the capability of FPGAs in enhancing the performance of SVM-based analysis of Microarray data as well as future bioinformatics applications.

  6. Decision Tree Classifier for Classification of Plant and Animal Micro RNA's

    NASA Astrophysics Data System (ADS)

    Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    Gene expression is regulated by miRNAs or micro RNAs which can be 21-23 nucleotide in length. They are non coding RNAs which control gene expression either by translation repression or mRNA degradation. Plants and animals both contain miRNAs which have been classified by wet lab techniques. These techniques are highly expensive, labour intensive and time consuming. Hence faster and economical computational approaches are needed. In view of above a machine learning model has been developed for classification of plant and animal miRNAs using decision tree classifier. The model has been tested on available data and it gives results with 91% accuracy.

  7. Model Children's Code.

    ERIC Educational Resources Information Center

    New Mexico Univ., Albuquerque. American Indian Law Center.

    The Model Children's Code was developed to provide a legally correct model code that American Indian tribes can use to enact children's codes that fulfill their legal, cultural and economic needs. Code sections cover the court system, jurisdiction, juvenile offender procedures, minor-in-need-of-care, and termination. Almost every Code section is…

  8. Coding of Neuroinfectious Diseases.

    PubMed

    Barkley, Gregory L

    2015-12-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  9. Diagnostic Coding for Epilepsy.

    PubMed

    Williams, Korwyn; Nuwer, Marc R; Buchhalter, Jeffrey R

    2016-02-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  10. To Code or Not To Code?

    ERIC Educational Resources Information Center

    Parkinson, Brian; Sandhu, Parveen; Lacorte, Manel; Gourlay, Lesley

    1998-01-01

    This article considers arguments for and against the use of coding systems in classroom-based language research and touches on some relevant considerations from ethnographic and conversational analysis approaches. The four authors each explain and elaborate on their practical decision to code or not to code events or utterances at a specific point…

  11. Classifying smoking urges via machine learning.

    PubMed

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  12. Hailstone classifier based on Rough Set Theory

    NASA Astrophysics Data System (ADS)

    Wan, Huisong; Jiang, Shuming; Wei, Zhiqiang; Li, Jian; Li, Fengjiao

    2017-09-01

    The Rough Set Theory was used for the construction of the hailstone classifier. Firstly, the database of the radar image feature was constructed. It included transforming the base data reflected by the Doppler radar into the bitmap format which can be seen. Then through the image processing, the color, texture, shape and other dimensional features should be extracted and saved as the characteristic database to provide data support for the follow-up work. Secondly, Through the Rough Set Theory, a machine for hailstone classifications can be built to achieve the hailstone samples’ auto-classification.

  13. Classifying Bugs is a Tricky Business.

    DTIC Science & Technology

    1983-08-01

    REPORT II PERIOD COVERED Classifying Bugs is a Tricky Business Technical 6. PERFORMING *"a. REPORT "UNDER 7- AUTHON(a S. CONTRACT on GRANT MuNDER () W...WRITELN(’ BAD INPUT. TRY AGAIN’); READ(RAINFALL) END; IF RAINFALL 4) 99999 THEN BEGIN TOTAL TOTAL + RAINFALL; DAYS DAYS + 1; READ(RAINFALL) END; END...this last question. READ(RAINFALL) WHILE RAINFALL 0, 99999 DO BEGIN WHILE RAINFALL < 0 DO BEGIN VRITELN(’ BAD INPUT. TRY AGAIN’); READ(RAINFALL) END

  14. Intelligent neural network classifier for automatic testing

    NASA Astrophysics Data System (ADS)

    Bai, Baoxing; Yu, Heping

    1996-10-01

    This paper is concerned with an application of a multilayer feedforward neural network for the vision detection of industrial pictures, and introduces a high characteristics image processing and recognizing system which can be used for real-time testing blemishes, streaks and cracks, etc. on the inner walls of high-accuracy pipes. To take full advantage of the functions of the artificial neural network, such as the information distributed memory, large scale self-adapting parallel processing, high fault-tolerance ability, this system uses a multilayer perceptron as a regular detector to extract features of the images to be inspected and classify them.

  15. Bare Code Reader

    NASA Astrophysics Data System (ADS)

    Clair, Jean J.

    1980-05-01

    The Bare code system will be used, in every market and supermarket. The code, which is normalised in US and Europe (code EAN) gives informations on price, storage, nature and allows in real time the gestion of theshop.

  16. Rotational Study of Ambiguous Taxonomic Classified Asteroids

    NASA Astrophysics Data System (ADS)

    Linder, Tyler R.; Sanchez, Rick; Wuerker, Wolfgang; Clayson, Timothy; Giles, Tucker

    2017-01-01

    The Sloan Digital Sky Survey (SDSS) moving object catalog (MOC4) provided the largest ever catalog of asteroid spectrophotometry observations. Carvano et al. (2010), while analyzing MOC4, discovered that individual observations of asteroids which were observed multiple times did not classify into the same photometric-based taxonomic class. A small subset of those asteroids were classified as having both the presence and absence of a 1um silicate absorption feature. If these variations are linked to differences in surface mineralogy, the prevailing assumption that an asteroid’s surface composition is predominantly homogenous would need to be reexamined. Furthermore, our understanding of the evolution of the asteroid belt, as well as the linkage between certain asteroids and meteorite types may need to be modified.This research is an investigation to determine the rotational rates of these taxonomically ambiguous asteroids. Initial questions to be answered:Do these asteroids have unique or nonstandard rotational rates?Is there any evidence in their light curve to suggest an abnormality?Observations were taken using PROMPT6 a 0.41-m telescope apart of the SKYNET network at Cerro Tololo Inter-American Observatory (CTIO). Observations were calibrated and analyzed using Canopus software. Initial results will be presented at AAS.

  17. Classifying multispectral data by neural networks

    NASA Technical Reports Server (NTRS)

    Telfer, Brian A.; Szu, Harold H.; Kiang, Richard K.

    1993-01-01

    Several energy functions for synthesizing neural networks are tested on 2-D synthetic data and on Landsat-4 Thematic Mapper data. These new energy functions, designed specifically for minimizing misclassification error, in some cases yield significant improvements in classification accuracy over the standard least mean squares energy function. In addition to operating on networks with one output unit per class, a new energy function is tested for binary encoded outputs, which result in smaller network sizes. The Thematic Mapper data (four bands were used) is classified on a single pixel basis, to provide a starting benchmark against which further improvements will be measured. Improvements are underway to make use of both subpixel and superpixel (i.e. contextual or neighborhood) information in tile processing. For single pixel classification, the best neural network result is 78.7 percent, compared with 71.7 percent for a classical nearest neighbor classifier. The 78.7 percent result also improves on several earlier neural network results on this data.

  18. Adaptive classifier for steel strip surface defects

    NASA Astrophysics Data System (ADS)

    Jiang, Mingming; Li, Guangyao; Xie, Li; Xiao, Mang; Yi, Li

    2017-01-01

    Surface defects detection system has been receiving increased attention as its precision, speed and less cost. One of the most challenges is reacting to accuracy deterioration with time as aged equipment and changed processes. These variables will make a tiny change to the real world model but a big impact on the classification result. In this paper, we propose a new adaptive classifier with a Bayes kernel (BYEC) which update the model with small sample to it adaptive for accuracy deterioration. Firstly, abundant features were introduced to cover lots of information about the defects. Secondly, we constructed a series of SVMs with the random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel to fuse the results from base SVMs. Finally, we proposed the method to update the Bayes evolutionary kernel. The proposed algorithm is experimentally compared with different algorithms, experimental results demonstrate that the proposed method can be updated with small sample and fit the changed model well. Robustness, low requirement for samples and adaptive is presented in the experiment.

  19. Induction with cross-classified categories.

    PubMed

    Murphy, G L; Ross, B H

    1999-11-01

    One of the main functions of categories is to allow inferences about new objects. However, most objects are cross-classified, and it is not known whether and how people combine information from these different categories in making inferences. In six experiments, food categories, which are strongly cross-classified (e.g., a bagel is both a bread and a breakfast food), were studied. For each food, the subjects were told fictitious facts (e.g., 75% of breads are subject to spoilage from Aspergillus molds) about two of the categories to which it belonged and then were asked to make an inference about the food (e.g., how likely is a bagel to be subject to spoilage from Aspergillus molds?). We found no more use of multiple categories in these cases of cross-classification than in ambiguous classification, in which it is uncertain to which category an item belongs. However, some procedural manipulations did markedly increase the use of both categories in inferences, primarily those that focused the subjects' attention on the critical feature in both categories.

  20. Cross-Classified Occupational Exposure Data

    PubMed Central

    Jones, Rachael M.; Burstyn, Igor

    2017-01-01

    We demonstrate the regression analysis of exposure determinants using cross-classified random effects in the context of lead exposures resulting from blasting surfaces in advance of painting. We had three specific objectives for analysis of the lead data, and observed: 1) high within-worker variability in personal lead exposures, explaining 79% of variability, 2) that the lead concentration outside of half-mask respirators was 2.4-fold higher than inside supplied-air blasting helmets, suggesting that the exposure reduction by blasting helmets may be lower than expected by the Assigned Protection Factor, and 3) that lead concentrations at fixed area locations in containment were not associated with personal lead exposures. In addition, we found that, on average, lead exposures among workers performing blasting and other activities was 40% lower than among workers performing only blasting. In the process of obtaining these analyses objectives, we determined that the data were non-hierarchical: Repeated exposure measurements were collected for a worker while the worker was a member of several groups, or cross-classified among groups. Since the worker is a member of multiple groups, the exposure data do not adhere to the traditionally assumed hierarchical structure. Forcing a hierarchical structure on these data led to similar within-group and between-group variability, but of precision in the estimate of effect of work activity on lead exposure. We hope hygienists and exposure assessors will consider non-hierarchical models in the design and analysis of exposure assessments. PMID:27029937

  1. Cross-classified occupational exposure data.

    PubMed

    Jones, Rachael M; Burstyn, Igor

    2016-09-01

    We demonstrate the regression analysis of exposure determinants using cross-classified random effects in the context of lead exposures resulting from blasting surfaces in advance of painting. We had three specific objectives for analysis of the lead data, and observed: (1) high within-worker variability in personal lead exposures, explaining 79% of variability; (2) that the lead concentration outside of half-mask respirators was 2.4-fold higher than inside supplied-air blasting helmets, suggesting that the exposure reduction by blasting helmets may be lower than expected by the Assigned Protection Factor; and (3) that lead concentrations at fixed area locations in containment were not associated with personal lead exposures. In addition, we found that, on average, lead exposures among workers performing blasting and other activities was 40% lower than among workers performing only blasting. In the process of obtaining these analyses objectives, we determined that the data were non-hierarchical: repeated exposure measurements were collected for a worker while the worker was a member of several groups, or cross-classified among groups. Since the worker is a member of multiple groups, the exposure data do not adhere to the traditionally assumed hierarchical structure. Forcing a hierarchical structure on these data led to similar within-group and between-group variability, but decreased precision in the estimate of effect of work activity on lead exposure. We hope hygienists and exposure assessors will consider non-hierarchical models in the design and analysis of exposure assessments.

  2. A Systematic Comparison of Supervised Classifiers

    PubMed Central

    Amancio, Diego Raphael; Comin, Cesar Henrique; Casanova, Dalcimar; Travieso, Gonzalo; Bruno, Odemir Martinez; Rodrigues, Francisco Aparecido; da Fontoura Costa, Luciano

    2014-01-01

    Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM). In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration. PMID:24763312

  3. Mercury⊕: An evidential reasoning image classifier

    NASA Astrophysics Data System (ADS)

    Peddle, Derek R.

    1995-12-01

    MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.

  4. Generalized concatenated quantum codes

    SciTech Connect

    Grassl, Markus; Shor, Peter; Smith, Graeme; Smolin, John; Zeng Bei

    2009-05-15

    We discuss the concept of generalized concatenated quantum codes. This generalized concatenation method provides a systematical way for constructing good quantum codes, both stabilizer codes and nonadditive codes. Using this method, we construct families of single-error-correcting nonadditive quantum codes, in both binary and nonbinary cases, which not only outperform any stabilizer codes for finite block length but also asymptotically meet the quantum Hamming bound for large block length.

  5. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All classified... 49 Transportation 9 2010-10-01 2010-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE...

  6. 36 CFR 1256.46 - National security-classified information.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 36 Parks, Forests, and Public Property 3 2011-07-01 2011-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National...

  7. 36 CFR 1256.46 - National security-classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 36 Parks, Forests, and Public Property 3 2013-07-01 2012-07-01 true National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National...

  8. 36 CFR 1256.46 - National security-classified information.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 36 Parks, Forests, and Public Property 3 2012-07-01 2012-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National...

  9. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Information Security Program § 503.59 Safeguarding classified information. (a) All classified information... security; (2) Takes appropriate steps to protect classified information from unauthorized disclosure or... security check; (2) To protect the classified information in accordance with the provisions of...

  10. 70. PRIMARY MILL AND CLASSIFIER No. 2 FROM NORTHWEST. MILL ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    70. PRIMARY MILL AND CLASSIFIER No. 2 FROM NORTHWEST. MILL DISCHARGED INTO LAUNDER WHICH PIERCED THE SIDE OF THE CLASSIFIER PAN. WOOD LAUNDER WITHIN CLASSIFIER VISIBLE (FILLED WITH DEBRIS). HORIZONTAL WOOD PLANKING BEHIND MILL IS FEED BOX. MILL SOLUTION PIPING RUNS ALONG BASE OF WEST SIDE OF CLASSIFIER. - Bald Mountain Gold Mill, Nevada Gulch at head of False Bottom Creek, Lead, Lawrence County, SD

  11. 5 CFR 1312.23 - Access to classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Access to classified information. 1312.23... Classified Information § 1312.23 Access to classified information. Classified information may be made... “need to know” and the access is essential to the accomplishment of official government duties....

  12. Intelligent query by humming system based on score level fusion of multiple classifiers

    NASA Astrophysics Data System (ADS)

    Pyo Nam, Gi; Thu Trang Luong, Thi; Ha Nam, Hyun; Ryoung Park, Kang; Park, Sung-Joo

    2011-12-01

    Recently, the necessity for content-based music retrieval that can return results even if a user does not know information such as the title or singer has increased. Query-by-humming (QBH) systems have been introduced to address this need, as they allow the user to simply hum snatches of the tune to find the right song. Even though there have been many studies on QBH, few have combined multiple classifiers based on various fusion methods. Here we propose a new QBH system based on the score level fusion of multiple classifiers. This research is novel in the following three respects: three local classifiers [quantized binary (QB) code-based linear scaling (LS), pitch-based dynamic time warping (DTW), and LS] are employed; local maximum and minimum point-based LS and pitch distribution feature-based LS are used as global classifiers; and the combination of local and global classifiers based on the score level fusion by the PRODUCT rule is used to achieve enhanced matching accuracy. Experimental results with the 2006 MIREX QBSH and 2009 MIR-QBSH corpus databases show that the performance of the proposed method is better than that of single classifier and other fusion methods.

  13. Accumulate repeat accumulate codes

    NASA Technical Reports Server (NTRS)

    Abbasfar, Aliazam; Divsalar, Dariush; Yao, Kung

    2004-01-01

    In this paper we propose an innovative channel coding scheme called 'Accumulate Repeat Accumulate codes' (ARA). This class of codes can be viewed as serial turbo-like codes, or as a subclass of Low Density Parity Check (LDPC) codes, thus belief propagation can be used for iterative decoding of ARA codes on a graph. The structure of encoder for this class can be viewed as precoded Repeat Accumulate (RA) code or as precoded Irregular Repeat Accumulate (IRA) code, where simply an accumulator is chosen as a precoder. Thus ARA codes have simple, and very fast encoder structure when they representing LDPC codes. Based on density evolution for LDPC codes through some examples for ARA codes, we show that for maximum variable node degree 5 a minimum bit SNR as low as 0.08 dB from channel capacity for rate 1/2 can be achieved as the block size goes to infinity. Thus based on fixed low maximum variable node degree, its threshold outperforms not only the RA and IRA codes but also the best known LDPC codes with the dame maximum node degree. Furthermore by puncturing the accumulators any desired high rate codes close to code rate 1 can be obtained with thresholds that stay close to the channel capacity thresholds uniformly. Iterative decoding simulation results are provided. The ARA codes also have projected graph or protograph representation that allows for high speed decoder implementation.

  14. Coset Codes Viewed as Terminated Convolutional Codes

    NASA Technical Reports Server (NTRS)

    Fossorier, Marc P. C.; Lin, Shu

    1996-01-01

    In this paper, coset codes are considered as terminated convolutional codes. Based on this approach, three new general results are presented. First, it is shown that the iterative squaring construction can equivalently be defined from a convolutional code whose trellis terminates. This convolutional code determines a simple encoder for the coset code considered, and the state and branch labelings of the associated trellis diagram become straightforward. Also, from the generator matrix of the code in its convolutional code form, much information about the trade-off between the state connectivity and complexity at each section, and the parallel structure of the trellis, is directly available. Based on this generator matrix, it is shown that the parallel branches in the trellis diagram of the convolutional code represent the same coset code C(sub 1), of smaller dimension and shorter length. Utilizing this fact, a two-stage optimum trellis decoding method is devised. The first stage decodes C(sub 1), while the second stage decodes the associated convolutional code, using the branch metrics delivered by stage 1. Finally, a bidirectional decoding of each received block starting at both ends is presented. If about the same number of computations is required, this approach remains very attractive from a practical point of view as it roughly doubles the decoding speed. This fact is particularly interesting whenever the second half of the trellis is the mirror image of the first half, since the same decoder can be implemented for both parts.

  15. Concatenated Coding Using Trellis-Coded Modulation

    NASA Technical Reports Server (NTRS)

    Thompson, Michael W.

    1997-01-01

    In the late seventies and early eighties a technique known as Trellis Coded Modulation (TCM) was developed for providing spectrally efficient error correction coding. Instead of adding redundant information in the form of parity bits, redundancy is added at the modulation stage thereby increasing bandwidth efficiency. A digital communications system can be designed to use bandwidth-efficient multilevel/phase modulation such as Amplitude Shift Keying (ASK), Phase Shift Keying (PSK), Differential Phase Shift Keying (DPSK) or Quadrature Amplitude Modulation (QAM). Performance gain can be achieved by increasing the number of signals over the corresponding uncoded system to compensate for the redundancy introduced by the code. A considerable amount of research and development has been devoted toward developing good TCM codes for severely bandlimited applications. More recently, the use of TCM for satellite and deep space communications applications has received increased attention. This report describes the general approach of using a concatenated coding scheme that features TCM and RS coding. Results have indicated that substantial (6-10 dB) performance gains can be achieved with this approach with comparatively little bandwidth expansion. Since all of the bandwidth expansion is due to the RS code we see that TCM based concatenated coding results in roughly 10-50% bandwidth expansion compared to 70-150% expansion for similar concatenated scheme which use convolution code. We stress that combined coding and modulation optimization is important for achieving performance gains while maintaining spectral efficiency.

  16. Just-in-time adaptive classifiers-part II: designing the classifier.

    PubMed

    Alippi, Cesare; Roveri, Manuel

    2008-12-01

    Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.

  17. Learning algorithms for stack filter classifiers

    SciTech Connect

    Porter, Reid B; Hush, Don; Zimmer, Beate G

    2009-01-01

    Stack Filters define a large class of increasing filter that is used widely in image and signal processing. The motivations for using an increasing filter instead of an unconstrained filter have been described as: (1) fast and efficient implementation, (2) the relationship to mathematical morphology and (3) more precise estimation with finite sample data. This last motivation is related to methods developed in machine learning and the relationship was explored in an earlier paper. In this paper we investigate this relationship by applying Stack Filters directly to classification problems. This provides a new perspective on how monotonicity constraints can help control estimation and approximation errors, and also suggests several new learning algorithms for Boolean function classifiers when they are applied to real-valued inputs.

  18. Classifying prion and prion-like phenomena.

    PubMed

    Harbi, Djamel; Harrison, Paul M

    2014-01-01

    The universe of prion and prion-like phenomena has expanded significantly in the past several years. Here, we overview the challenges in classifying this data informatically, given that terms such as "prion-like", "prion-related" or "prion-forming" do not have a stable meaning in the scientific literature. We examine the spectrum of proteins that have been described in the literature as forming prions, and discuss how "prion" can have a range of meaning, with a strict definition being for demonstration of infection with in vitro-derived recombinant prions. We suggest that although prion/prion-like phenomena can largely be apportioned into a small number of broad groups dependent on the type of transmissibility evidence for them, as new phenomena are discovered in the coming years, a detailed ontological approach might be necessary that allows for subtle definition of different "flavors" of prion / prion-like phenomena.

  19. [Ne V] Emission in Optically Classified Starbursts

    NASA Astrophysics Data System (ADS)

    Abel, N. P.; Satyapal, S.

    2008-05-01

    Detecting active galactic nuclei (AGNs) in galaxies dominated by powerful nuclear star formation and extinction effects poses a unique challenge. Due to the longer wavelength emission and the ionization potential of Ne4+, infrared [Ne V] emission lines are thought to be excellent AGN diagnostics. However, stellar evolution models predict that Wolf-Rayet stars in young stellar clusters emit significant numbers of photons capable of creating Ne4+. Recent observations of [Ne V] emission in optically classified starburst galaxies require us to investigate whether [Ne V] can arise from star formation activity and not an AGN. In this work, we calculate the optical and IR spectrum of gas exposed to a young starburst and AGN SED. We find: (1) a range of parameters where [Ne V] emission can be explained solely by star formation and (2) a range of relative AGN to starburst luminosities that reproduces the [Ne V] observations, yet leaves the optical spectrum looking like a starburst. We also find that infrared emission-line diagnostics are much more sensitive to the AGNs than optical diagnostics, particularly for weak AGNs. We apply our model to the optically classified, yet [Ne V] emitting, starburst galaxy NGC 3621. We find, when taking the infrared and optical spectrum into account, ~30%-50% of the galaxy's total luminosity is due to an AGN. Our calculations show that [Ne V] emission is almost always the result of AGN activity. The models presented in this work can be used to determine the AGN contribution to a galaxy's power output.

  20. Discussion on LDPC Codes and Uplink Coding

    NASA Technical Reports Server (NTRS)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the progress that the workgroup on Low-Density Parity-Check (LDPC) for space link coding. The workgroup is tasked with developing and recommending new error correcting codes for near-Earth, Lunar, and deep space applications. Included in the presentation is a summary of the technical progress of the workgroup. Charts that show the LDPC decoder sensitivity to symbol scaling errors are reviewed, as well as a chart showing the performance of several frame synchronizer algorithms compared to that of some good codes and LDPC decoder tests at ESTL. Also reviewed is a study on Coding, Modulation, and Link Protocol (CMLP), and the recommended codes. A design for the Pseudo-Randomizer with LDPC Decoder and CRC is also reviewed. A chart that summarizes the three proposed coding systems is also presented.