Science.gov

Sample records for classifying coding dna

  1. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  2. DNA: Polymer and molecular code

    NASA Astrophysics Data System (ADS)

    Shivashankar, G. V.

    1999-10-01

    The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes

  3. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction

    PubMed Central

    Pokkuluri, Kiran Sree; Inampudi, Ramesh Babu; Nedunuri, S. S. S. N. Usha Devi

    2014-01-01

    Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively. PMID:25132849

  4. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction.

    PubMed

    Pokkuluri, Kiran Sree; Inampudi, Ramesh Babu; Nedunuri, S S S N Usha Devi

    2014-01-01

    Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively. PMID:25132849

  5. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance. PMID:24078908

  6. [Compulsive molecular hoarding enables the evolution of protein-coding DNA from non-coding DNA].

    PubMed

    Casane, Didier; Laurenti, Patrick

    2014-12-01

    It was thought until recently that a new gene could only evolve from a previously existing gene, from recombination of genes, or from horizontal gene transfer. Recently a series of genomic and transcriptomic studies have led to the identification of non-coding DNA as a significant source of protein coding genes. The mechanism, which is probably universal since it has been identified in a wide array of eukaryotes, implies that a gradient of proto-genes, probably established by a balance between selection and genetic drift, exists between coding DNA and non-coding DNA. Therefore genome dynamics could account for the progressive formation of genes "out of the blue" thanks to the interplay of mutation and natural selection.

  7. Classifying sets of attributed scattering centers using a hash coded database

    NASA Astrophysics Data System (ADS)

    Dungan, Kerry E.; Potter, Lee C.

    2010-04-01

    We present a fast, scalable method to simultaneously register and classify vehicles in circular synthetic aperture radar imagery. The method is robust to clutter, occlusions, and partial matches. Images are represented as a set of attributed scattering centers that are mapped to local sets, which are invariant to rigid transformations. Similarity between local sets is measured using a method called pyramid match hashing, which applies a pyramid match kernel to compare sets and a Hamming distance to compare hash codes generated from those sets. By preprocessing a database into a Hamming space, we are able to quickly find the nearest neighbor of a query among a large number of records. To demonstrate the algorithm, we simulated X-band scattering from ten civilian vehicles placed throughout a large scene, varying elevation angles in the 35 to 59 degree range. We achieved better than 98 percent classification performance. We also classified seven vehicles in a 2006 public release data collection with 100% success.

  8. Security authentication with a three-dimensional optical phase code using random forest classifier.

    PubMed

    Markman, Adam; Carnicer, Artur; Javidi, Bahram

    2016-06-01

    An object with a unique three-dimensional (3D) optical phase mask attached is analyzed for security and authentication. These 3D optical phase masks are more difficult to duplicate or to have a mathematical formulation compared with 2D masks and thus have improved security capabilities. A quick response code was modulated using a random 3D optical phase mask generating a 3D optical phase code (OPC). Due to the scattering of light through the 3D OPC, a unique speckle pattern based on the materials and structure in the 3D optical phase mask is generated and recorded on a CCD device. Feature extraction is performed by calculating the mean, variance, skewness, kurtosis, and entropy for each recorded speckle pattern. The random forest classifier is used for authentication. Optical experiments demonstrate the feasibility of the authentication scheme. PMID:27409445

  9. Classified-edge guided depth resampling for multi-view coding

    NASA Astrophysics Data System (ADS)

    Lu, Yu; Zhou, Yang; Chen, Hua-hua

    2016-01-01

    A new depth resampling for multi-view coding is proposed in this paper. At first, the depth video is downsampled by median filtering before encoding. After decoding, the classified edges, including credible edge and probable edge from the aligned texture image and the depth image, are interpolated by the selected diagonal pair, whose intensity difference is the minimum among four diagonal pairs around edge pixel. According to different category of edge, the intensity difference is measured by either real depth or percentage depth without any parameter setting. Finally, the resampled depth video and the decoded full-resolution texture video are synthesized into virtual views for the performance evaluation. Experiments on the platform of multi-view high efficiency video coding (HEVC) demonstrate that the proposed method is superior to the contrastive methods in terms of visual quality and rate distortion (RD) performance.

  10. Species independence of mutual information in coding and noncoding DNA

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo; Herzel, Hanspeter; Buldyrev, Sergey V.; Stanley, H. Eugene

    2000-05-01

    We explore if there exist universal statistical patterns that are different in coding and noncoding DNA and can be found in all living organisms, regardless of their phylogenetic origin. We find that (i) the mutual information function I has a significantly different functional form in coding and noncoding DNA. We further find that (ii) the probability distributions of the average mutual information I¯ are significantly different in coding and noncoding DNA, while (iii) they are almost the same for organisms of all taxonomic classes. Surprisingly, we find that I¯ is capable of predicting coding regions as accurately as organism-specific coding measures.

  11. BioCode: Two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA

    PubMed Central

    2013-01-01

    Background In recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA). Results This paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA. Conclusion The algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated

  12. Nonlinear Aspects of Coding and Noncoding DNA Sequences

    NASA Astrophysics Data System (ADS)

    Stanley, H. Eugene

    2001-03-01

    One of the most remarkable features of human DNA is that 97 percent is not coding for proteins. Studying this noncoding DNA is important both for practical reasons (to distinguish it from the coding DNA as the human genome is sequenced), and for scientific reasons (why is the noncoding DNA present at all, if it appears to have little if any purpose?). In this talk we discuss new methods of analyzing coding and noncoding DNA in parallel, with a view to uncovering different statistical properties of the two kinds of DNA. We also speculate on possible roles of noncoding DNA. The work reported here was carried out primarily by P. Bernaola-Galvan, S. V. Buldyrev, P. Carpena, N. Dokholyan, A. L. Goldberger, I. Grosse, S. Havlin, H. Herzel, J. L. Oliver, C.-K. Peng, M. Simons, H. E. Stanley, R. H. R. Stanley, and G. M. Viswanathan. [1] For a brief overview in language that physicists can understand, see H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng, and M. Simons, "Scaling Features of Noncoding DNA" [Proc. XII Max Born Symposium, Wroclaw], Physica A 273, 1-18 (1999). [2] I. Grosse, H. Herzel, S. V. Buldyrev, and H. E. Stanley, "Species Independence of Mutual Information in Coding and Noncoding DNA," Phys. Rev. E 61, 5624-5629 (2000). [3] P. Bernaola-Galvan, I. Grosse, P. Carpena, J. L. Oliver, and H. E. Stanley, "Identification of DNA Coding Regions Using an Entropic Segmentation Method," Phys. Rev. Lett. 84, 1342-1345 (2000). [4] N. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Distributions of Dimeric Tandem Repeats in Non-coding and Coding DNA Sequences," J. Theor. Biol. 202, 273-282 (2000). [5] R. H. R. Stanley, N. V. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Clumping of Identical Oligonucleotides in Coding and Noncoding DNA Sequences," J. Biomol. Structure and Design 17, 79-87 (1999). [6] N. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Distribution of Base Pair Repeats in Coding and Noncoding DNA

  13. On fuzzy semantic similarity measure for DNA coding.

    PubMed

    Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin

    2016-02-01

    A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms.

  14. Random aggregation models for the formation and evolution of coding and non-coding DNA

    NASA Astrophysics Data System (ADS)

    Provata, A.

    A random aggregation model with influx is proposed for the formation of the non-coding DNA regions via random co-aggregation and influx of biological macromolecules such as viruses, parasite DNA, and replication segments. The constant mixing (transpositions) and influx drives the system in an out-of-equilibrium steady state characterised by a power law size distribution. The model predicts the long range distributions found in the noncoding eucaryotic DNA and explains the observed correlations. For the formation of coding DNA a random closed aggregation model is proposed which predicts short range coding size distributions. The closed aggregation process drives the system in an almost “frozen” stable state which is robust to external perturbations and which is characterised by well defined space and time scales, as observed in coding sequences.

  15. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  16. Using Abbreviated Injury Scale (AIS) codes to classify Computed Tomography (CT) features in the Marshall System

    PubMed Central

    2010-01-01

    Background The purpose of Abbreviated Injury Scale (AIS) is to code various types of Traumatic Brain Injuries (TBI) based on their anatomical location and severity. The Marshall CT Classification is used to identify those subgroups of brain injured patients at higher risk of deterioration or mortality. The purpose of this study is to determine whether and how AIS coding can be translated to the Marshall Classification Methods Initially, a Marshall Class was allocated to each AIS code through cross-tabulation. This was agreed upon through several discussion meetings with experts from both fields (clinicians and AIS coders). Furthermore, in order to make this translation possible, some necessary assumptions with regards to coding and classification of mass lesions and brain swelling were essential which were all approved and made explicit. Results The proposed method involves two stages: firstly to determine all possible Marshall Classes which a given patient can attract based on allocated AIS codes; via cross-tabulation and secondly to assign one Marshall Class to each patient through an algorithm. Conclusion This method can be easily programmed in computer softwares and it would enable future important TBI research programs using trauma registry data. PMID:20691038

  17. Coding-complete sequencing classifies parrot bornavirus 5 into a novel virus species.

    PubMed

    Marton, Szilvia; Bányai, Krisztián; Gál, János; Ihász, Katalin; Kugler, Renáta; Lengyel, György; Jakab, Ferenc; Bakonyi, Tamás; Farkas, Szilvia L

    2015-11-01

    In this study, we determined the sequence of the coding region of an avian bornavirus detected in a blue-and-yellow macaw (Ara ararauna) with pathological/histopathological changes characteristic of proventricular dilatation disease. The genomic organization of the macaw bornavirus is similar to that of other bornaviruses, and its nucleotide sequence is nearly identical to the available partial parrot bornavirus 5 (PaBV-5) sequences. Phylogenetic analysis showed that these strains formed a monophyletic group distinct from other mammalian and avian bornaviruses and in calculations performed with matrix protein coding sequences, the PaBV-5 and PaBV-6 genotypes formed a common cluster, suggesting that according to the recently accepted classification system for bornaviruses, these two genotypes may belong to a new species, provisionally named Psittaciform 2 bornavirus.

  18. Diversity and Recombination of Dispersed Ribosomal DNA and Protein Coding Genes in Microsporidia

    PubMed Central

    Ironside, Joseph Edward

    2013-01-01

    Microsporidian strains are usually classified on the basis of their ribosomal DNA (rDNA) sequences. Although rDNA occurs as multiple copies, in most non-microsporidian species copies within a genome occur as tandem arrays and are homogenised by concerted evolution. In contrast, microsporidian rDNA units are dispersed throughout the genome in some species, and on this basis are predicted to undergo reduced concerted evolution. Furthermore many microsporidian species appear to be asexual and should therefore exhibit reduced genetic diversity due to a lack of recombination. Here, DNA sequences are compared between microsporidia with different life cycles in order to determine the effects of concerted evolution and sexual reproduction upon the diversity of rDNA and protein coding genes. Comparisons of cloned rDNA sequences between microsporidia of the genus Nosema with different life cycles provide evidence of intragenomic variability coupled with strong purifying selection. This suggests a birth and death process of evolution. However, some concerted evolution is suggested by clustering of rDNA sequences within species. Variability of protein-coding sequences indicates that considerable intergenomic variation also occurs between microsporidian cells within a single host. Patterns of variation in microsporidian DNA sequences indicate that additional diversity is generated by intragenomic and/or intergenomic recombination between sequence variants. The discovery of intragenomic variability coupled with strong purifying selection in microsporidian rRNA sequences supports the hypothesis that concerted evolution is reduced when copies of a gene are dispersed rather than repeated tandemly. The presence of intragenomic variability also renders the use of rDNA sequences for barcoding microsporidia questionable. Evidence of recombination in the single-copy genes of putatively asexual microsporidia suggests that these species may undergo cryptic sexual reproduction, a

  19. Structural Code for DNA Recognition Revealed in Crystal Structures of Papillomavirus E2-DNA Targets

    NASA Astrophysics Data System (ADS)

    Rozenberg, Haim; Rabinovich, Dov; Frolow, Felix; Hegde, Rashmi S.; Shakked, Zippora

    1998-12-01

    Transcriptional regulation in papillomaviruses depends on sequence-specific binding of the regulatory protein E2 to several sites in the viral genome. Crystal structures of bovine papillomavirus E2 DNA targets reveal a conformational variant of B-DNA characterized by a roll-induced writhe and helical repeat of 10.5 bp per turn. A comparison between the free and the protein-bound DNA demonstrates that the intrinsic structure of the DNA regions contacted directly by the protein and the deformability of the DNA region that is not contacted by the protein are critical for sequence-specific protein/DNA recognition and hence for gene-regulatory signals in the viral system. We show that the selection of dinucleotide or longer segments with appropriate conformational characteristics, when positioned at correct intervals along the DNA helix, can constitute a structural code for DNA recognition by regulatory proteins. This structural code facilitates the formation of a complementary protein-DNA interface that can be further specified by hydrogen bonds and nonpolar interactions between the protein amino acids and the DNA bases.

  20. Non-coding RNAs in DNA damage response

    PubMed Central

    Liu, Yunhua; Lu, Xiongbin

    2012-01-01

    Genome-wide studies have revealed that human and other mammalian genomes are pervasively transcribed and produce thousands of regulatory non-protein-coding RNAs (ncRNAs), including miRNAs, siRNAs, piRNAs and long non-coding RNAs (lncRNAs). Emerging evidences suggest that these ncRNAs also play a pivotal role in genome integrity and stability via the regulation of DNA damage response (DDR). In this review, we discuss the recent finding on the interplay of ncRNAs with the canonical DDR signaling pathway, with a particular emphasis on miRNAs and lncRNAs. While the expression of ncRNAs is regulated in the DDR, the DDR is also subjected to regulation by those DNA damage-responsive ncRNAs. In addition, the roles of those Dicer- and Drosha-dependent small RNAs produced in the vicinity of double-strand breaks sites are also described. PMID:23226613

  1. Extra-coding RNAs regulate neuronal DNA methylation dynamics

    PubMed Central

    Savell, Katherine E.; Gallus, Nancy V. N.; Simon, Rhiana C.; Brown, Jordan A.; Revanna, Jasmin S.; Osborn, Mary Katherine; Song, Esther Y.; O'Malley, John J.; Stackhouse, Christian T.; Norvil, Allison; Gowher, Humaira; Sweatt, J. David; Day, Jeremy J.

    2016-01-01

    Epigenetic mechanisms such as DNA methylation are essential regulators of the function and information storage capacity of neurons. DNA methylation is highly dynamic in the developing and adult brain, and is actively regulated by neuronal activity and behavioural experiences. However, it is presently unclear how methylation status at individual genes is targeted for modification. Here, we report that extra-coding RNAs (ecRNAs) interact with DNA methyltransferases and regulate neuronal DNA methylation. Expression of ecRNA species is associated with gene promoter hypomethylation, is altered by neuronal activity, and is overrepresented at genes involved in neuronal function. Knockdown of the Fos ecRNA locus results in gene hypermethylation and mRNA silencing, and hippocampal expression of Fos ecRNA is required for long-term fear memory formation in rats. These results suggest that ecRNAs are fundamental regulators of DNA methylation patterns in neuronal systems, and reveal a promising avenue for therapeutic targeting in neuropsychiatric disease states. PMID:27384705

  2. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity. PMID:25023893

  3. DNA information: from digital code to analogue structure.

    PubMed

    Travers, A A; Muskhelishvili, G; Thompson, J M T

    2012-06-28

    The digital linear coding carried by the base pairs in the DNA double helix is now known to have an important component that acts by altering, along its length, the natural shape and stiffness of the molecule. In this way, one region of DNA is structurally distinguished from another, constituting an additional form of encoded information manifest in three-dimensional space. These shape and stiffness variations help in guiding and facilitating the DNA during its three-dimensional spatial interactions. Such interactions with itself allow communication between genes and enhanced wrapping and histone-octamer binding within the nucleosome core particle. Meanwhile, interactions with proteins can have a reduced entropic binding penalty owing to advantageous sequence-dependent bending anisotropy. Sequence periodicity within the DNA, giving a corresponding structural periodicity of shape and stiffness, also influences the supercoiling of the molecule, which, in turn, plays an important facilitating role. In effect, the super-helical density acts as an analogue regulatory mode in contrast to the more commonly acknowledged purely digital mode. Many of these ideas are still poorly understood, and represent a fundamental and outstanding biological question. This review gives an overview of very recent developments, and hopefully identifies promising future lines of enquiry. PMID:22615471

  4. Dual enzyme electrochemical coding for detecting DNA hybridization.

    PubMed

    Wang, Joseph; Kawde, Abdel-Nasser; Musameh, Mustafa; Rivas, Gustavo

    2002-10-01

    Enzyme-based hybridization assays for the simultaneous electrochemical measurements of two DNA targets are described. Two encoding enzymes, alkaline phosphatase and beta-galactosidase, are used to differentiate the signals of two DNA targets in connection to chronopotentiometric measurements of their electroactive phenol and alpha-naphthol products. These products yield well-defined and resolved peaks at +0.31 V (alpha-naphthol) and +0.63 V (phenol) at the graphite working electrode (vs. Ag/AgCl reference). The position and size of these peaks reflect the identity and level of the corresponding target. The dual target detection capability is coupled to the amplification feature of enzyme tags (to yield fmol detection limits) and with an efficient magnetic removal of non-hybridized nucleic acids. Proper attention is given to the choice of the substrates (for attaining well resolved peaks), to the activity of the enzymes (for obtaining similar sensitivities), and to the selection of the enzymes (for minimizing cross interferences). The new bioassay is illustrated for the simultaneous detection of two DNA sequences related to the BCRA1 breast-cancer gene in a single sample in connection to magnetic beads bearing the corresponding oligonucleotide probes. Prospects for electrochemical coding of multiple DNA targets are discussed.

  5. Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: Evolutionary footprints of RNA silencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Pyknons are non-random sequence patterns significantly repeated throughout non-coding genomic DNA that also appear at least once among genes. They are interesting because they portend an unforeseen connection between coding and non-coding DNA. Pyknons have only been discovered in the human genome,...

  6. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  7. Generalized DNA Barcode Design Based on Hamming Codes

    PubMed Central

    Bystrykh, Leonid V.

    2012-01-01

    The diversity and scope of multiplex parallel sequencing applications is steadily increasing. Critically, multiplex parallel sequencing applications methods rely on the use of barcoded primers for sample identification, and the quality of the barcodes directly impacts the quality of the resulting sequence data. Inspection of the recent publications reveals a surprisingly variable quality of the barcodes employed. Some barcodes are made in a semi empirical fashion, without quantitative consideration of error correction or minimal distance properties. After systematic comparison of published barcode sets, including commercially distributed barcoded primers from Illumina and Epicentre, methods for improved, Hamming code-based sequences are suggested and illustrated. Hamming barcodes can be employed for DNA tag designs in many different ways while preserving minimal distance and error-correcting properties. In addition, Hamming barcodes remain flexible with regard to essential biological parameters such as sequence redundancy and GC content. Wider adoption of improved Hamming barcodes is encouraged in multiplex parallel sequencing applications. PMID:22615825

  8. Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.

    2006-03-01

    We study the primary DNA structure of four of the most completely sequenced human chromosomes (including chromosome 19 which is the most dense in coding), using non-extensive statistics. We show that the exponents governing the spatial decay of the coding size distributions vary between 5.2 ≤r ≤5.7 for the short scales and 1.45 ≤q ≤1.50 for the large scales. On the contrary, the exponents governing the spatial decay of the non-coding size distributions in these four chromosomes, take the values 2.4 ≤r ≤3.2 for the short scales and 1.50 ≤q ≤1.72 for the large scales. These results, in particular the values of the tail exponent q, indicate the existence of correlations in the coding and non-coding size distributions with tendency for higher correlations in the non-coding DNA.

  9. An Integrated Prognostic Classifier for Stage I Lung Adenocarcinoma based on mRNA, microRNA and DNA Methylation Biomarkers

    PubMed Central

    Robles, Ana I.; Arai, Eri; Mathé, Ewy A.; Okayama, Hirokazu; Schetter, Aaron J.; Brown, Derek; Petersen, David; Bowman, Elise D.; Noro, Rintaro; Welsh, Judith A.; Edelman, Daniel C.; Stevenson, Holly S.; Wang, Yonghong; Tsuchiya, Naoto; Kohno, Takashi; Skaug, Vidar; Mollerup, Steen; Haugen, Aage; Meltzer, Paul S.; Yokota, Jun; Kanai, Yae

    2015-01-01

    Introduction Up to 30% Stage I lung cancer patients suffer recurrence within 5 years of curative surgery. We sought to improve existing protein-coding gene and microRNA expression prognostic classifiers by incorporating epigenetic biomarkers. Methods Genome-wide screening of DNA methylation and pyrosequencing analysis of HOXA9 promoter methylation were performed in two independently collected cohorts of Stage I lung adenocarcinoma. The prognostic value of HOXA9 promoter methylation alone and in combination with mRNA and miRNA biomarkers was assessed by Cox regression and Kaplan-Meier survival analysis in both cohorts. Results Promoters of genes marked by Polycomb in Embryonic Stem Cells were methylated de novo in tumors and identified patients with poor prognosis. The HOXA9 locus was methylated de novo in Stage I tumors (P < 0.0005). High HOXA9 promoter methylation was associated with worse cancer-specific survival (Hazard Ratio [HR], 2.6; P = 0.02) and recurrence-free survival (HR, 3.0; P = 0.01), and identified high-risk patients in stratified analysis of Stage IA and IB. Four protein-coding gene (XPO1, BRCA1, HIF1α, DLC1), miR-21 expression and HOXA9 promoter methylation were each independently associated with outcome (HR, 2.8; P = 0.002; HR, 2.3; P = 0.01; and HR, 2.4; P = 0.005, respectively), and, when combined, identified high-risk, therapy naïve, Stage I patients (HR, 10.2; P = 3x10−5). All associations were confirmed in two independently collected cohorts. Conclusion A prognostic classifier comprising three types of genomic and epigenomic data may help guide the postoperative management of Stage I lung cancer patients at high risk of recurrence. PMID:26134223

  10. In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy.

    PubMed

    Zhang, Jin; Zhang, Wenqing; Yang, Huijie

    2016-01-01

    Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

  11. A DNA methylation classifier of cervical precancer based on human papillomavirus and human genes.

    PubMed

    Brentnall, Adam R; Vasiljević, Nataša; Scibior-Bentkowska, Dorota; Cadman, Louise; Austin, Janet; Szarewski, Anne; Cuzick, Jack; Lorincz, Attila T

    2014-09-15

    Testing for high-risk (hr) types of human papillomavirus (HPV) is highly sensitive as a screening test of high-grade cervical intraepithelial neoplastic (CIN2/3) disease, the precursor of cervical cancer. However, it has a relatively low specificity. Our objective was to develop a prediction rule with a higher specificity, using combinations of human and HPV DNA methylation. Exfoliated cervical specimens from colposcopy-referral cohorts in London were analyzed for DNA methylation levels by pyrosequencing in the L1 and L2 regions of HPV16, HPV18, HPV31 and human genes EPB41L3, DPYS and MAL. Samples from 1,493 hrHPV-positive women were assessed and of these 556 were found to have CIN2/3 at biopsy; 556 tested positive for HPV16 (323 CIN2/3), 201 for HPV18 (73 CIN2/3) and 202 for HPV31 (98 CIN2/3). The prediction rule included EPB41L3 and HPV and had area under curve 0.80 (95% CI 0.78-0.82). For 90% sensitivity, specificity was 36% (33-40) and positive predictive value (PPV) was 46% (43-48). By HPV type, 90% sensitivity corresponded to the following specificities and PPV, respectively: HPV16, 38% (32-45) and 67% (63-71); HPV18, 53% (45-62) and 52% (45-59); HPV31, 39% (31-49) and 58% (51-65); HPV16, 18 or 31, 44% (40-49) and 62% (59-65) and other hrHPV 17% (14-21) and 21% (18-24). We conclude that a methylation assay in hrHPV-positive women might improve PPV with minimal sensitivity loss.

  12. Periodicity in DNA primary structure is defined by secondary structure of the coded protein.

    PubMed Central

    Zhurkin, V B

    1981-01-01

    A 10.5-base periodicity found earlier is inherent in both eu- and prokaryotic coding nucleotide sequences. In the case of noncoding eukaryotic sequences no periodicity is found, so the 10.5-base oscillation seemingly does not correlate with the nucleosomal organization of DNA. It is shown that the DNA fragments, coding the alpha-helical protein segments, manifest the pronounced 10.5-base periodicity, while those regions of DNA which code the beta-structure have a 6-base oscillation. The repeating pattern of nucleotide sequences can be used for comparison of the DNA segments with low degree of homology. PMID:7243595

  13. Yeast phenotype classifies mammalian protein kinase C cDNA mutants.

    PubMed Central

    Riedel, H; Su, L; Hansen, H

    1993-01-01

    The phorbol ester receptor protein kinase C (PKC) gene family encodes essential mediators of eukaryotic cellular signals. Molecular dissection of their mechanisms of action has been limited in part by the lack of random mutagenesis approaches and by the complexity of signaling pathways in mammalian cells which involve multiple PKC isoforms. Here we present a rapid screen which permits the quantification of mammalian PKC activity phenotypically in the yeast Saccharomyces cerevisiae. Bovine PKC alpha cDNA is functionally expressed in S. cerevisiae. This results in a phorbol ester response: a fourfold increase in the cell doubling time and a substantial decrease in yeast colony size on agar plates. We have expressed pools of bovine PKC alpha cDNAs mutagenized by Bal 31 deletion of internal, amino-terminal, or carboxyl-terminal sequences and have identified three classes of mutants on the basis of their distinct yeast phenotypes. Representatives of each class were analyzed. An internal deletion of amino acids (aa) 172 to 225 displayed ligand-dependent but reduced catalytic activity, an amino-terminal truncation of aa 1 to 153 displayed elevated and ligand-independent activity, and a carboxyl-terminal 26-aa truncation (aa 647 to 672) lacked activity under any conditions. Additional mutations confirmed the distinct functional characteristics of these classes. Our data show that deletion of the V1 and C1 regions results in elevated basal catalytic activity which is still Ca2+ responsive. Internal deletions in the V2 and C2 regions do not abolish phorbol ester or Ca2+ regulation of PKC activity, suggesting that most of the C2 domain is not essential for phorbol ester stimulation and most of the regulatory domain is dispensable for Ca2+ regulation of PKC activity. These distinct activities od the PKC mutants correlate with a specific and proportional yeast phenotype and are quantified on agar plates by yeast colony size. This provides a phenotypic screen which is suitable

  14. Coding and non-coding DNA thermal stability differences in eukaryotes studied by melting simulation, base shuffling and DNA nearest neighbor frequency analysis.

    PubMed

    Long, Dang D; Grosse, Ivo; Marx, Kenneth A

    2004-07-01

    The melting of the coding and non-coding classes of natural DNA sequences was investigated using a program, MELTSIM, which simulates DNA melting based upon an empirically parameterized nearest neighbor thermodynamic model. We calculated T(m) results of 8144 natural sequences from 28 eukaryotic organisms of varying F(GC) (mole fraction of G and C) and of 3775 coding and 3297 non-coding sequences derived from those natural sequences. These data demonstrated that the T(m) vs. F(GC) relationships in coding and non-coding DNAs are both linear but have a statistically significant difference (6.6%) in their slopes. These relationships are significantly different from the T(m) vs. F(GC) relationship embodied in the classical Marmur-Schildkraut-Doty (MSD) equation for the intact long natural sequences. By analyzing the simulation results from various base shufflings of the original DNAs and the average nearest neighbor frequencies of those natural sequences across the F(GC) range, we showed that these differences in the T(m) vs. F(GC) relationships are largely a direct result of systematic F(GC)-dependent biases in nearest neighbor frequencies for those two different DNA classes. Those differences in the T(m) vs. F(GC) relationships and biases in nearest neighbor frequencies also appear between the sequences from multicellular and unicellular organisms in the same coding or non-coding classes, albeit of smaller but significant magnitudes.

  15. Coding and non-coding DNA thermal stability differences in eukaryotes studied by melting simulation, base shuffling and DNA nearest neighbor frequency analysis.

    PubMed

    Long, Dang D; Grosse, Ivo; Marx, Kenneth A

    2004-07-01

    The melting of the coding and non-coding classes of natural DNA sequences was investigated using a program, MELTSIM, which simulates DNA melting based upon an empirically parameterized nearest neighbor thermodynamic model. We calculated T(m) results of 8144 natural sequences from 28 eukaryotic organisms of varying F(GC) (mole fraction of G and C) and of 3775 coding and 3297 non-coding sequences derived from those natural sequences. These data demonstrated that the T(m) vs. F(GC) relationships in coding and non-coding DNAs are both linear but have a statistically significant difference (6.6%) in their slopes. These relationships are significantly different from the T(m) vs. F(GC) relationship embodied in the classical Marmur-Schildkraut-Doty (MSD) equation for the intact long natural sequences. By analyzing the simulation results from various base shufflings of the original DNAs and the average nearest neighbor frequencies of those natural sequences across the F(GC) range, we showed that these differences in the T(m) vs. F(GC) relationships are largely a direct result of systematic F(GC)-dependent biases in nearest neighbor frequencies for those two different DNA classes. Those differences in the T(m) vs. F(GC) relationships and biases in nearest neighbor frequencies also appear between the sequences from multicellular and unicellular organisms in the same coding or non-coding classes, albeit of smaller but significant magnitudes. PMID:15223141

  16. Virus-coded DNA endonuclease from avian retrovirus.

    PubMed Central

    Golomb, M; Grandgenett, D P; Mason, W

    1981-01-01

    Reverse transcriptase from avian retrovirus has a physically associated DNA endonuclease with novel substrate and cofactor requirements. A similar endonuclease activity copurifies with pp32, a protein from viral cores that has been identified with the non-alpha region of the beta subunit of reverse transcriptase. Several temperature-sensitive mutants of avian retrovirus with thermolabile DNA polymerase were tested for thermal sensitivity of their DNA endonuclease activity. Two pol mutants of Rous sarcoma virus, ts335 and ts337, had thermolabile DNA endonuclease; a temperature-resistant revertant of ts335 had a heat-stable DNA endonuclease. DNA endonuclease is therefore a product of the pol gene and an integral part of the reverse transcriptase. A second class of pol mutants, typified by ts568 and ts553, had thermolabile DNA polymerase, but heat-stable DNA endonuclease. PMID:6165835

  17. The mammalian transcriptome and the function of non-coding DNA sequences

    PubMed Central

    Shabalina, Svetlana A; Spiridonov, Nikolay A

    2004-01-01

    For decades, researchers have focused most of their attention on protein-coding genes and proteins. With the completion of the human and mouse genomes and the accumulation of data on the mammalian transcriptome, the focus now shifts to non-coding DNA sequences, RNA-coding genes and their transcripts. Many non-coding transcribed sequences are proving to have important regulatory roles, but the functions of the majority remain mysterious. PMID:15059247

  18. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find

  19. Stochastic model of homogeneous coding and latent periodicity in DNA sequences.

    PubMed

    Chaley, Maria; Kutyrkin, Vladimir

    2016-02-01

    The concept of latent triplet periodicity in coding DNA sequences which has been earlier extensively discussed is confirmed in the result of analysis of a number of eukaryotic genomes, where latent periodicity of a new type, called profile periodicity, is recognized in the CDSs. Original model of Stochastic Homogeneous Organization of Coding (SHOC-model) in textual string is proposed. This model explains the existence of latent profile periodicity and regularity in DNA sequences. PMID:26656186

  20. Non-coding RNAs: an emerging player in DNA damage response.

    PubMed

    Zhang, Chunzhi; Peng, Guang

    2015-01-01

    Non-coding RNAs play a crucial role in maintaining genomic stability which is essential for cell survival and preventing tumorigenesis. Through an extensive crosstalk between non-coding RNAs and the canonical DNA damage response (DDR) signaling pathway, DDR-induced expression of non-coding RNAs can provide a regulatory mechanism to accurately control the expression of DNA damage responsive genes in a spatio-temporal manner. Mechanistically, DNA damage alters expression of a variety of non-coding RNAs at multiple levels including transcriptional regulation, post-transcriptional regulation, and RNA degradation. In parallel, non-coding RNAs can directly regulate cellular processes involved in DDR by altering expression of their targeting genes, with a particular emphasis on miRNAs and lncRNAs. MiRNAs are required for almost every aspect of cellular responses to DNA damage, including sensing DNA damage, transducing damage signals, repairing damaged DNA, activating cell cycle checkpoints, and inducing apoptosis. As for lncRNAs, they control transcription of DDR relevant gene by four different regulatory models, including signal, decoy, guide, and scaffold. In addition, we also highlight potential clinical applications of non-coding RNAs as biomarkers and therapeutic targets for anti-cancer treatments using DNA-damaging agents including radiation and chemotherapy. Although tremendous advances have been made to elucidate the role of non-coding RANs in genome maintenance, many key questions remain to be answered including mechanistically how non-coding RNA pathway and DNA damage response pathway is coordinated in response to genotoxic stress.

  1. Analysis of similarity/dissimilarity of DNA sequences based on convolutional code model.

    PubMed

    Liu, Xiao; Tian, Feng Chun; Wang, Shi Yuan

    2010-02-01

    Based on the convolutional code model of error-correction coding theory, we propose an approach to characterize and compare DNA sequences with consideration of the effect of codon context. We construct an 8-component vector whose components are the normalized leading eigenvalues of the L/L and M/M matrices associated with the original DNA sequences and the transformed sequences. The utility of our approach is illustrated by the examination of the similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of 11 species, and the efficiency of error-correction coding theory in analysis of similarity/dissimilarity of DNA sequences is represented.

  2. Is there an error correcting code in the base sequence in DNA?

    PubMed Central

    Liebovitch, L S; Tao, Y; Todorov, A T; Levine, L

    1996-01-01

    Modern methods of encoding information into digital form include error check digits that are functions of the other information digits. When digital information is transmitted, the values of the error check digits can be computed from the information digits to determine whether the information has been received accurately. These error correcting codes make it possible to detect and correct common errors in transmission. The sequence of bases in DNA is also a digital code consisting of four symbols: A, C, G, and T. Does DNA also contain an error correcting code? Such a code would allow repair enzymes to protect the fidelity of nonreplicating DNA and increase the accuracy of replication. If a linear block error correcting code is present in DNA then some bases would be a linear function of the other bases in each set of bases. We developed an efficient procedure to determine whether such an error correcting code is present in the base sequence. We illustrate the use of this procedure by using it to analyze the lac operon and the gene for cytochrome c. These genes do not appear to contain such a simple error correcting code. PMID:8874027

  3. Mutation patterns of mtDNA: Empirical inferences for the coding region

    PubMed Central

    2008-01-01

    Background Human mitochondrial DNA (mtDNA) has been extensively used in population and evolutionary genetics studies. Thus, a valid estimate of human mtDNA evolutionary rate is important in many research fields. The small number of estimations performed for the coding region of the molecule, showed important differences between phylogenetic and empirical approaches. We analyzed a portion of the coding region of mtDNA (tRNALeu, ND1 and tRNAIle genes), using individuals belonging to extended families from the Azores Islands (Portugal) with the main aim of providing empirical estimations of the mutation rate of the coding region of mtDNA under different assumptions, and hence to better understand the mtDNA evolutionary process. Results Heteroplasmy was detected in 6.5% (3/46) of the families analyzed. In all of the families the presence of mtDNA heteroplasmy resulted from three new point mutations, and no cases of insertions or deletions were identified. Major differences were found in the proportion and type of heteroplasmy found in the genes studied when compared to those obtained in a previous report for the D-loop. Our empirical estimation of mtDNA coding region mutation rate, calculated taking into account the sex of individuals carrying new mutations, the probability of intra-individual fixation of mutations present in heteroplasmy and, to the possible extent, the effect of selection, is similar to that obtained using phylogenetic approaches. Conclusion Based on our results, the discrepancy previously reported between the human mtDNA coding region mutation rates observed along evolutionary timescales and estimations obtained using family pedigrees can be resolved when correcting for the previously cited factors. PMID:18518963

  4. TOWARDS A PROBABILISTIC RECOGNITION CODE FOR PROTEIN-DNA INTERACTIONS

    SciTech Connect

    P. BENOS; ET AL

    2000-09-01

    We are investigating the rules that govern protein-DNA interactions, using a statistical mechanics based formalism that is related to the Boltzmann Machine of the neural net literature. Our approach is data-driven, in which probabilistic algorithms are used to model protein-DNA interactions, given SELEX and phage data as input. Under the ''one-to-one'' model for interactions (i.e. one amino acid contacts one base), we can successfully identify the wild-type binding sites of EGR and MIG protein families. The predictions using our method are the same or better than that of methods existing in the literature, however our methodology offers the potential to capitalize in quantitative detail on more data as it becomes available.

  5. Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code

    NASA Astrophysics Data System (ADS)

    Jolivet, R.; Rothen, F.

    2001-08-01

    Statistical analysis of the distribution of codons in DNA coding sequences of bacteria or archaea suggests that, at some stage of the prebiotic world, the most successful RNA replicating sequences afforded some tendency toward a weak form of palindromic symmetry, namely complementary symmetry. As a consequence, as soon as the machinery allowing translation into proteins was beginning to settle, we assume that primeval versions of the genetic code essentially consisted of pairs of sense-antisense codons. Present-day DNA sequences display footprints of this early symmetry, provided that statistics are made over coding sequences issued from groups of organisms and not only from the genome of an individual species. These fossil traces are proven to be significant from the statistical point of view. They shed some light onto the possible evolution of the genetic code and set some constraints on the way it had to follow.

  6. Statistical analysis of nucleotide runs in coding and noncoding DNA sequences.

    PubMed

    Sprizhitsky YuA; Nechipurenko YuD; Alexandrov, A A; Volkenstein, M V

    1988-10-01

    A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.

  7. Position-dependent correlations between DNA methylation and the evolutionary rates of mammalian coding exons

    PubMed Central

    Chuang, Trees-Juen; Chen, Feng-Chi; Chen, Yen-Zho

    2012-01-01

    DNA cytosine methylation is a central epigenetic marker that is usually mutagenic and may increase the level of sequence divergence. However, methylated genes have been reported to evolve more slowly than unmethylated genes. Hence, there is a controversy on whether DNA methylation is correlated with increased or decreased protein evolutionary rates. We hypothesize that this controversy has resulted from the differential correlations between DNA methylation and the evolutionary rates of coding exons in different genic positions. To test this hypothesis, we compare human–mouse and human–macaque exonic evolutionary rates against experimentally determined single-base resolution DNA methylation data derived from multiple human cell types. We show that DNA methylation is significantly related to within-gene variations in evolutionary rates. First, DNA methylation level is more strongly correlated with C-to-T mutations at CpG dinucleotides in the first coding exons than in the internal and last exons, although it is positively correlated with the synonymous substitution rate in all exon positions. Second, for the first exons, DNA methylation level is negatively correlated with exonic expression level, but positively correlated with both nonsynonymous substitution rate and the sample specificity of DNA methylation level. For the internal and last exons, however, we observe the opposite correlations. Our results imply that DNA methylation level is differentially correlated with the biological (and evolutionary) features of coding exons in different genic positions. The first exons appear more prone to the mutagenic effects, whereas the other exons are more influenced by the regulatory effects of DNA methylation. PMID:23019368

  8. Molecular cloning of cDNA coding for rat proliferating cell nuclear antigen (PCNA)/cyclin.

    PubMed Central

    Matsumoto, K; Moriuchi, T; Koji, T; Nakane, P K

    1987-01-01

    The 'proliferating cell nuclear antigen' (PCNA), also known as cyclin, appears at the G1/S boundary in the cell cycle. Because of its possible relationship with cell proliferation, PCNA/cyclin has been receiving attention. PCNA/cyclin is a non-histone acidic nuclear protein with an apparent mol. wt of 33000-36000. The amino acid composition and the sequence of the first 25 amino acids of rabbit PCNA/cyclin are known. Using an oligonucleotide probe corresponding to the sequence of the first five amino acids, a cDNA clone for PCNA/cyclin was isolated from rat thymocyte cDNA library. The cDNA (1195 bases) contains an open reading frame of 813 nucleotides coding for 261 amino acids. The 3'-non-coding region is 312 nucleotides long and contains three putative polyadenylation signals. The mol. wt of rat PCNA/cyclin was calculated to be 28 748. The deduced amino acid sequence and composition of rat PCNA/cyclin are in excellent agreement with the published data. Using the cDNA probe, two species of mRNA (1.1 and 0.98 kb) were detected in rat thymocyte RNA. Southern blot analysis of total human genomic DNA suggests that there is a single gene coding for PCNA/cyclin. The deduced amino acid sequence of rat PCNA/cyclin has a similarity with that of herpes simplex virus type-1 DNA binding protein. Images Fig. 3. Fig. 4. PMID:2884104

  9. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

    PubMed Central

    Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

    2015-01-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID

  10. Non-Coding RNA: Sequence-Specific Guide for Chromatin Modification and DNA Damage Signaling

    PubMed Central

    Francia, Sofia

    2015-01-01

    Chromatin conformation shapes the environment in which our genome is transcribed into RNA. Transcription is a source of DNA damage, thus it often occurs concomitantly to DNA damage signaling. Growing amounts of evidence suggest that different types of RNAs can, independently from their protein-coding properties, directly affect chromatin conformation, transcription and splicing, as well as promote the activation of the DNA damage response (DDR) and DNA repair. Therefore, transcription paradoxically functions to both threaten and safeguard genome integrity. On the other hand, DNA damage signaling is known to modulate chromatin to suppress transcription of the surrounding genetic unit. It is thus intriguing to understand how transcription can modulate DDR signaling while, in turn, DDR signaling represses transcription of chromatin around the DNA lesion. An unexpected player in this field is the RNA interference (RNAi) machinery, which play roles in transcription, splicing and chromatin modulation in several organisms. Non-coding RNAs (ncRNAs) and several protein factors involved in the RNAi pathway are well known master regulators of chromatin while only recent reports show their involvement in DDR. Here, we discuss the experimental evidence supporting the idea that ncRNAs act at the genomic loci from which they are transcribed to modulate chromatin, DDR signaling and DNA repair. PMID:26617633

  11. DNA methylation patterns of protein-coding genes and long non-coding RNAs in males with schizophrenia.

    PubMed

    Liao, Qi; Wang, Yunliang; Cheng, Jia; Dai, Dongjun; Zhou, Xingyu; Zhang, Yuzheng; Li, Jinfeng; Yin, Honglei; Gao, Shugui; Duan, Shiwei

    2015-11-01

    Schizophrenia (SCZ) is one of the most complex mental illnesses affecting ~1% of the population worldwide. SCZ pathogenesis is considered to be a result of genetic as well as epigenetic alterations. Previous studies have aimed to identify the causative genes of SCZ. However, DNA methylation of long non-coding RNAs (lncRNAs) involved in SCZ has not been fully elucidated. In the present study, a comprehensive genome-wide analysis of DNA methylation was conducted using samples from two male patients with paranoid and undifferentiated SCZ, respectively. Methyl-CpG binding domain protein-enriched genome sequencing was used. In the two patients with paranoid and undifferentiated SCZ, 1,397 and 1,437 peaks were identified, respectively. Bioinformatic analysis demonstrated that peaks were enriched in protein-coding genes, which exhibited nervous system and brain functions. A number of these peaks in gene promoter regions may affect gene expression and, therefore, influence SCZ-associated pathways. Furthermore, 7 and 20 lncRNAs, respectively, in the Refseq database were hypermethylated. According to the lncRNA dataset in the NONCODE database, ~30% of intergenic peaks overlapped with novel lncRNA loci. The results of the present study demonstrated that aberrant hypermethylation of lncRNA genes may be an important epigenetic factor associated with SCZ. However, further studies using larger sample sizes are required.

  12. Triplex DNA:RNA, 3'-to-5' inverted RNA and protein coding in mitochondrial genomes.

    PubMed

    Seligmann, Hervé

    2013-09-01

    Triple-stranded DNA:RNA helices of unknown function in vertebrate mitochondria associate with replication and transcription. Antiparallel Hoogsteen pairings form triplexes at physiological conditions. Intermolecular antiparallel triplexes require inverted 3'-to-5' RNA polymerization, which was never observed. Three rare, long natural 3'-to-5' inverted GenBank RNAs from mice mitochondria suggest occasional inverted transcription, putatively coding for proteins. BLAST aligns 18 GenBank-stored proteins with hypothetical proteins translated from the 3'-to-5' inverted Mus musculus mitochondrial genome. Three are DNA-binding, five are membrane proteins. 25% of main frame codons contribute to their 3'-to-5' overlap coding. Properties of these codons match those of overlap coding protein genes, as compared to codons not expected involved in inverted coding: a) nucleotide contents at synonymous codon positions in mitochondrial genomes fit replicational deamination gradients (A->G and C->T), but digress from gradients when functioning as nonsynonymous positions in putative 3'-to-5' overlapping genes; b) bias against 'circular code' codons (codon groups creating unambiguity between frames), and favouring homogenous codons (AAA, CCC, GGG, TTT) characterize overlapping genes, including putative 3'-to-5' overlapping genes, as compared to nonoverlapping coding sequences from the same main frame gene. This signature correlates with digression from deamination gradients. Deamination and circular code tests confirm independently alignment-based predictions of overlapping 3'-to-5' protein coding genes. Results indicate varying expression for different 3'-to-5' overlapping genes. Inverted 3'-to-5' RNA is produced, perhaps by an unknown RNA polymerase (invertase) putatively coded by 3'-to-5' inverted RNA. PMID:23841652

  13. Characterization of a cDNA clone coding for the beta chain of bovine fibrinogen.

    PubMed Central

    Chung, D W; Rixon, M W; MacGillivray, R T; Davie, E W

    1981-01-01

    Recombinant plasmids containing bovine cDNA have been screened with a radiolabeled cDNA enriched for bovine fibrinogen. A number of plasmids containing cDNAs for fibrinogen were identified by this assay. One plasmid, designated pBI beta 1, was found to contain a cDNA insert of 1372 base pairs. The sequence of the cDNA insert for this plasmid was then determined. It was shown to code for 424 amino acids of the beta chain of fibrinogen, starting with residue 44. This and other data made it possible to construct the complete amino acid sequence of the beta chain of the protein. Comparison of the amino acid sequence of the beta chain of bovine fibrinogen with the corresponding chain of the human molecule indicated that the two chains are greater than 80% homologous. PMID:6262803

  14. Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1995-04-01

    This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)

  15. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  16. Correcting sequencing errors in DNA coding regions using a dynamic programming approach

    SciTech Connect

    Xu, Y.; Mural, R.J.; Uberbacher, E.C.

    1994-12-01

    This paper presents an algorithm for detecting and ``correcting`` sequencing errors that occur in DNA coding regions. The types of sequencing error addressed include insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of ``neutral`` bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. The authors have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. On a test set consisting of 68 Human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the ``corrected`` sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the ``corrupted`` sequences using standard GRAIL II method. The method uses a dynamic programming algorithm, and runs in time and space linear to the size of the input sequence.

  17. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Astrophysics Data System (ADS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C.-K.; Simons, M.; Stanley, H. E.

    1995-09-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C.elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of the coding regions. In particular, (i) an n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger ``n-gram redundancy'') than the coding regions. In contrast to the three chromosomes, we find that for vertebrates-such as primates and rodents-and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of zero- and first-order Markovian models or simple nucleotide repeats to account fully for these ``linguistic'' features of DNA. Finally, we emphasize that our results by no means prove the existence of a ``language'' in noncoding DNA.

  18. A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes

    PubMed Central

    Fongang, Bernard; Kong, Fanping; Negi, Surendra; Braun, Werner; Kudlicki, Andrzej

    2016-01-01

    The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated. PMID:27739488

  19. Non-coding RNAs mediate the rearrangements of genomic DNA in ciliates.

    PubMed

    Feng, Xuezhu; Guang, Shouhong

    2013-10-01

    Most eukaryotes employ a variety of mechanisms to defend the integrity of their genome by recognizing and silencing parasitic mobile nucleic acids. However, recent studies have shown that genomic DNA undergoes extensive rearrangements, including DNA elimination, fragmentation, and unscrambling, during the sexual reproduction of ciliated protozoa. Non-coding RNAs have been identified to program and regulate genome rearrangement events. In Paramecium and Tetrahymena, scan RNAs (scnRNAs) are produced from micronuclei and transported to vegetative macronuclei, in which scnRNA elicits the elimination of cognate genomic DNA. In contrast, Piwi-interacting RNAs (piRNAs) in Oxytricha enable the retention of genomic DNA that exhibits sequence complementarity in macronuclei. An RNA interference (RNAi)-like mechanism has been found to direct these genomic rearrangements. Furthermore, in Oxytricha, maternal RNA templates can guide the unscrambling process of genomic DNA. The non-coding RNA-directed genome rearrangements may have profound evolutionary implications, for example, eliciting the multigenerational inheritance of acquired adaptive traits. PMID:24008384

  20. Robust chemical preservation of digital information on DNA in silica with error-correcting codes.

    PubMed

    Grass, Robert N; Heckel, Reinhard; Puddu, Michela; Paunescu, Daniela; Stark, Wendelin J

    2015-02-16

    Information, such as text printed on paper or images projected onto microfilm, can survive for over 500 years. However, the storage of digital information for time frames exceeding 50 years is challenging. Here we show that digital information can be stored on DNA and recovered without errors for considerably longer time frames. To allow for the perfect recovery of the information, we encapsulate the DNA in an inorganic matrix, and employ error-correcting codes to correct storage-related errors. Specifically, we translated 83 kB of information to 4991 DNA segments, each 158 nucleotides long, which were encapsulated in silica. Accelerated aging experiments were performed to measure DNA decay kinetics, which show that data can be archived on DNA for millennia under a wide range of conditions. The original information could be recovered error free, even after treating the DNA in silica at 70 °C for one week. This is thermally equivalent to storing information on DNA in central Europe for 2000 years.

  1. Junk DNA and the long non-coding RNA twist in cancer genetics

    PubMed Central

    Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

    2015-01-01

    The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839

  2. Classifying Microorganisms.

    ERIC Educational Resources Information Center

    Baker, William P.; Leyva, Kathryn J.; Lang, Michael; Goodmanis, Ben

    2002-01-01

    Focuses on an activity in which students sample air at school and generate ideas about how to classify the microorganisms they observe. The results are used to compare air quality among schools via the Internet. Supports the development of scientific inquiry and technology skills. (DDR)

  3. HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

    PubMed Central

    Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392

  4. HyDEn: a hybrid steganocryptographic approach for data encryption using randomized error-correcting DNA codes.

    PubMed

    Tulpan, Dan; Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

    2013-01-01

    This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach.

  5. General Strategy for the Design of DNA Coding Sequences Applied to Nanoparticle Assembly.

    PubMed

    Calais, Théo; Baijot, Vincent; Djafari Rouhani, Mehdi; Gauchard, David; Chabal, Yves J; Rossi, Carole; Estève, Alain

    2016-09-20

    The DNA-directed assembly of nano-objects has been the subject of many recent studies as a means to construct advanced nanomaterial architectures. Although much experimental in silico work has been presented and discussed, there has been no in-depth consideration of the proper design of single-strand sticky termination of DNA sequences, noted as ssST, which is important in avoiding self-folding within one DNA strand, unwanted strand-to-strand interaction, and mismatching. In this work, a new comprehensive and computationally efficient optimization algorithm is presented for the construction of all possible DNA sequences that specifically prevents these issues. This optimization procedure is also effective when a spacer section is used, typically repeated sequences of thymine or adenine placed between the ssST and the nano-object, to address the most conventional experimental protocols. We systematically discuss the fundamental statistics of DNA sequences considering complementarities limited to two (or three) adjacent pairs to avoid self-folding and hybridization of identical strands due to unwanted complements and mismatching. The optimized DNA sequences can reach maximum lengths of 9 to 34 bases depending on the level of applied constraints. The thermodynamic properties of the allowed sequences are used to develop a ranking for each design. For instance, we show that the maximum melting temperature saturates with 14 bases under typical solvation and concentration conditions. Thus, DNA ssST with optimized sequences are developed for segments ranging from 4 to 40 bases, providing a very useful guide for all technological protocols. An experimental test is presented and discussed using the aggregation of Al and CuO nanoparticles and is shown to validate and illustrate the importance of the proposed DNA coding sequence optimization. PMID:27578445

  6. A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding

    NASA Astrophysics Data System (ADS)

    Jin, Xin; Nie, Rencan; Zhou, Dongming; Yao, Shaowen; Chen, Yanyan; Yu, Jiefu; Wang, Quan

    2016-11-01

    A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

  7. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  8. DNA methylation patterns of protein coding genes and long noncoding RNAs in female schizophrenic patients.

    PubMed

    Liao, Qi; Wang, Yunliang; Cheng, Jia; Dai, Dongjun; Zhou, Xingyu; Zhang, Yuzheng; Gao, Shugui; Duan, Shiwei

    2015-02-01

    Schizophrenia (SCZ) is a complex mental disorder contributed by both genetic and epigenetic factors. Long noncoding RNAs (lncRNAs) was recently found playing an important regulatory role in mental disorders. However, little was known about the DNA methylation of lncRNAs, although numerous SCZ studies have been performed on genetic polymorphisms or epigenetic marks in protein coding genes. We presented a comprehensive genome wide DNA methylation study of both protein coding genes and lncRNAs in female patients with paranoid and undifferentiated SCZ. Using the methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq), 8,163 and 764 peaks were identified in paranoid and undifferentiated SCZ, respectively (p < 1 × 10-5). Gene ontology analysis showed that the hypermethylated regions were enriched in the genes related to neuron system and brain for both paranoid and undifferentiated SCZ (p < 0.05). Among these peaks, 121 peaks were located in gene promoter regions that might affect gene expression and influence the SCZ related pathways. Interestingly, DNA methylation of 136 and 23 known lncRNAs in Refseq database were identified in paranoid and undifferentiated SCZ, respectively. In addition, ∼20% of intergenic peaks annotated based on Refseq genes were overlapped with lncRNAs in UCSC and gencode databases. In order to show the results well for most biological researchers, we created an online database to display and visualize the information of DNA methyation peaks in both types of SCZ (http://www.bioinfo.org/scz/scz.htm). Our results showed that the aberrant DNA methylation of lncRNAs might be another important epigenetic factor for SCZ.

  9. DNA strand breaks induced by electrons simulated with Nanodosimetry Monte Carlo Simulation Code: NASIC.

    PubMed

    Li, Junli; Li, Chunyan; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Wu, Zhen; Zeng, Zhi; Tung, Chuanjong

    2015-09-01

    The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway.

  10. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish

    PubMed Central

    2016-01-01

    Abstract The zebrafish has emerged as a model organism for genomics studies. The symposium “Toward an encyclopedia of DNA elements in zebrafish” held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE. PMID:26671609

  11. DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish.

    PubMed

    Tan, Haihan; Onichtchouk, Daria; Winata, Cecilia

    2016-02-01

    The zebrafish has emerged as a model organism for genomics studies. The symposium "Toward an encyclopedia of DNA elements in zebrafish" held in London in December 2014, was coorganized by Ferenc Müller and Fiona Wardle. This meeting is a follow-up of a similar previous workshop held 2 years earlier and represents a push toward the formalization of a community effort to annotate functional elements in the zebrafish genome. The meeting brought together zebrafish researchers, bioinformaticians, as well as members of established consortia, to exchange scientific findings and experience, as well as to discuss the initial steps toward the formation of a DANIO-CODE consortium. In this study, we provide the latest updates on the current progress of the consortium's efforts, opening up a broad invitation to researchers to join in and contribute to DANIO-CODE.

  12. A cDNA clone containing the entire coding sequence of a mouse H-2Kd histocompatibility antigen

    PubMed Central

    Lalanne, Jean-Louis; Delarbre, Christiane; Gachelin, Gabriel; Kourilsky, Philippe

    1983-01-01

    We have isolated a cDNA clone carrying a 1560 bp long insert which contains the entire coding and 3′ untranslated regions of an H-2Kd mouse histocompatibility antigen. Its sequence and overal features are described. They point to the existence of unique properties of DNA sequences associated with the H-2Kd antigen. PMID:6298749

  13. Isolation and characterization of a cDNA coding for human factor IX.

    PubMed

    Kurachi, K; Davie, E W

    1982-11-01

    A cDNA library prepared from human liver has been screened for factor IX (Christmas factor), a clotting factor that participates in the middle phase of blood coagulation. The library was screened with a single-stranded DNA prepared from enriched mRNA for baboon factor IX and a synthetic oligonucleotide mixture. A plasmid was identified that contained a cDNA insert of 1,466 base pairs coding for human factor IX. The insert is flanked by G-C tails of 11 and 18 base pairs at the 5' and 3' ends, respectively. It also included 138 base pairs that code for an amino-terminal leader sequence, 1,248 base pairs that code for the mature protein, a stop codon, and 48 base pairs of noncoding sequence at the 3' end. The leader sequence contains 46 amino acid residues, and it is proposed that this sequence includes both a signal sequence and a pro sequence for the mature protein that circulates in plasma. The 1,248 base pairs code for a polypeptide chain composed of 416 amino acids. The amino-terminal region for this protein contains 12 glutamic acid residues that are converted to gamma-carboxyglutamic acid in the mature protein. These glutamic acid residues are coded for by both GAA and GAG. The arginyl peptide bonds that are cleaved in the conversion of human factor IX to factor IXa by factor XIa were identified as Arg145-Ala146 and Arg180-Val181. The cleavage of these two internal peptide bonds results in the formation of an activation peptide (35 amino acids) and factor IXa, a serine protease composed of a light chain (145 amino acids) and a heavy chain (236 amino acids), and these two chains are held together by a disulfide bond(s). The active site residues including histidine, aspartate, and serine are located in the heavy chain at positions 221, 270, and 366, respectively. These amino acids are homologous with His57, Asp102, and Ser195 in the active site of chymotrypsin. Two potential carbohydrate binding sites (Asn-X-Thr) were identified in the activation peptide, and

  14. Computerized classified document accountability

    SciTech Connect

    Norris, C.B.; Lewin, R.

    1988-08-01

    This step-by-step procedure was established as a guideline to be used with the Savvy PC Database Program for the accountability of classified documents. Its purpose is to eventually phase out the use of logbooks for classified document tracking. The program runs on an IBM PC or compatible computer using a Bernoulli Box, a Hewlett Packard 71B Bar Code Reader, an IOMEGA Host Adapter Board for creating mirror images of data for backup purposes, and the Disk Operating System (DOS). The DOS batch files ''IN'' and ''OUT'' invoke the Savvy Databases for either entering incoming or outgoing documents. The main files are DESTRUCTION, INLOG, OUTLOG, and NAME-NUMBER. The fields in the files are Adding/Changing, Routing, Destroying, Search-Print by document identification, Search/Print Audit by bar code number, Print Holdings of a person, and Print Inventory of an office.

  15. Multimedia Classifier

    NASA Astrophysics Data System (ADS)

    Costache, G. N.; Gavat, I.

    2004-09-01

    Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.

  16. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    PubMed

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  17. Coding region SNP analysis to enhance dog mtDNA discrimination power in forensic casework.

    PubMed

    Verscheure, Sophie; Backeljau, Thierry; Desmyter, Stijn

    2015-01-01

    The high population frequencies of three control region haplotypes contribute to the low discrimination power of the dog mtDNA control region. It also diminishes the evidential power of a match with one of these haplotypes in forensic casework. A mitochondrial genome study of 214 Belgian dogs suggested 26 polymorphic coding region sites that successfully resolved dogs with the three most frequent control region haplotypes. In this study, three SNP assays were developed to determine the identity of the 26 informative sites. The control region of 132 newly sampled dogs was sequenced and added to the study of 214 dogs. The assays were applied to 58 dogs of the haplotypes of interest, which confirmed their suitability for enhancing dog mtDNA discrimination power. In the Belgian population study of 346 dogs, the set of 26 sites divided the dogs into 25 clusters of mtGenome sequences with substantially lower population frequency estimates than their control region sequences. In case of a match with one of the three control region haplotypes, using these three SNP assays in conjunction with control region sequencing would augment the exclusion probability of dog mtDNA analysis from 92.9% to 97.0%.

  18. Coding region SNP analysis to enhance dog mtDNA discrimination power in forensic casework.

    PubMed

    Verscheure, Sophie; Backeljau, Thierry; Desmyter, Stijn

    2015-01-01

    The high population frequencies of three control region haplotypes contribute to the low discrimination power of the dog mtDNA control region. It also diminishes the evidential power of a match with one of these haplotypes in forensic casework. A mitochondrial genome study of 214 Belgian dogs suggested 26 polymorphic coding region sites that successfully resolved dogs with the three most frequent control region haplotypes. In this study, three SNP assays were developed to determine the identity of the 26 informative sites. The control region of 132 newly sampled dogs was sequenced and added to the study of 214 dogs. The assays were applied to 58 dogs of the haplotypes of interest, which confirmed their suitability for enhancing dog mtDNA discrimination power. In the Belgian population study of 346 dogs, the set of 26 sites divided the dogs into 25 clusters of mtGenome sequences with substantially lower population frequency estimates than their control region sequences. In case of a match with one of the three control region haplotypes, using these three SNP assays in conjunction with control region sequencing would augment the exclusion probability of dog mtDNA analysis from 92.9% to 97.0%. PMID:25299153

  19. Toward a Code for the Interactions of Zinc Fingers with DNA: Selection of Randomized Fingers Displayed on Phage

    NASA Astrophysics Data System (ADS)

    Choo, Yen; Klug, Aaron

    1994-11-01

    We have used two selection techniques to study sequence-specific DNA recognition by the zinc finger, a small, modular DNA-binding minidomain. We have chosen zinc fingers because they bind as independent modules and so can be linked together in a peptide designed to bind a predetermined DNA site. In this paper, we describe how a library of zinc fingers displayed on the surface of bacteriophage enables selection of fingers capable of binding to given DNA triplets. The amino acid sequences of selected fingers which bind the same triplet are compared to examine how sequence-specific DNA recognition occurs. Our results can be rationalized in terms of coded interactions between zinc fingers and DNA, involving base contacts from a few α-helical positions. In the paper following this one, we describe a complementary technique which confirms the identity of amino acids capable of DNA sequence discrimination from these positions.

  20. Temporal and spatial trends in prey composition of wahoo Acanthocybium solandri: a diet analysis from the central North Pacific Ocean using visual and DNA bar-coding techniques.

    PubMed

    Oyafuso, Z S; Toonen, R J; Franklin, E C

    2016-04-01

    A diet analysis was conducted on 444 wahoo Acanthocybium solandri caught in the central North Pacific Ocean longline fishery and a nearshore troll fishery surrounding the Hawaiian Islands from June to December 2014. In addition to traditional observational methods of stomach contents, a DNA bar-coding approach was integrated into the analysis by sequencing the cytochrome c oxidase subunit 1 (COI) region of the mtDNA genome to taxonomically identify individual prey items that could not be classified visually to species. For nearshore-caught A. solandri, juvenile pre-settlement reef fish species from various families dominated the prey composition during the summer months, followed primarily by Carangidae in autumn months. Gempylidae, Echeneidae and Scombridae were dominant prey taxa from the offshore fishery. Molidae was a common prey family found in stomachs collected north-east of the Hawaiian Archipelago while tetraodontiform reef fishes, known to have extended pelagic stages, were prominent prey items south-west of the Hawaiian Islands. The diet composition of A. solandri was indicative of an adaptive feeder and thus revealed dominant geographic and seasonal abundances of certain taxa from various ecosystems in the marine environment. The addition of molecular bar-coding to the traditional visual method of prey identifications allowed for a more comprehensive range of the prey field of A. solandri to be identified and should be used as a standard component in future diet studies.

  1. Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes

    PubMed Central

    Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi

    2004-01-01

    Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely

  2. Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

    PubMed Central

    Tramontano, A; Macchiato, M F

    1986-01-01

    An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761

  3. DNA bar coding and pyrosequencing to analyze adverse events in therapeutic gene transfer.

    PubMed

    Wang, Gary P; Garrigue, Alexandrine; Ciuffi, Angela; Ronen, Keshet; Leipzig, Jeremy; Berry, Charles; Lagresle-Peyrou, Chantal; Benjelloun, Fatine; Hacein-Bey-Abina, Salima; Fischer, Alain; Cavazzana-Calvo, Marina; Bushman, Frederic D

    2008-05-01

    Gene transfer has been used to correct inherited immunodeficiencies, but in several patients integration of therapeutic retroviral vectors activated proto-oncogenes and caused leukemia. Here, we describe improved methods for characterizing integration site populations from gene transfer studies using DNA bar coding and pyrosequencing. We characterized 160,232 integration site sequences in 28 tissue samples from eight mice, where Rag1 or Artemis deficiencies were corrected by introducing the missing gene with gamma-retroviral or lentiviral vectors. The integration sites were characterized for their genomic distributions, including proximity to proto-oncogenes. Several mice harbored abnormal lymphoproliferations following therapy--in these cases, comparison of the location and frequency of isolation of integration sites across multiple tissues helped clarify the contribution of specific proviruses to the adverse events. We also took advantage of the large number of pyrosequencing reads to show that recovery of integration sites can be highly biased by the use of restriction enzyme cleavage of genomic DNA, which is a limitation in all widely used methods, but describe improved approaches that take advantage of the power of pyrosequencing to overcome this problem. The methods described here should allow integration site populations from human gene therapy to be deeply characterized with spatial and temporal resolution.

  4. Detection of coding microsatellite frameshift mutations in DNA mismatch repair-deficient mouse intestinal tumors.

    PubMed

    Woerner, Stefan M; Tosti, Elena; Yuan, Yan P; Kloor, Matthias; Bork, Peer; Edelmann, Winfried; Gebert, Johannes

    2015-11-01

    Different DNA mismatch repair (MMR)-deficient mouse strains have been developed as models for the inherited cancer predisposing Lynch syndrome. It is completely unresolved, whether coding mononucleotide repeat (cMNR) gene mutations in these mice can contribute to intestinal tumorigenesis and whether MMR-deficient mice are a suitable molecular model of human microsatellite instability (MSI)-associated intestinal tumorigenesis. A proof-of-principle study was performed to identify mouse cMNR-harboring genes affected by insertion/deletion mutations in MSI murine intestinal tumors. Bioinformatic algorithms were developed to establish a database of mouse cMNR-harboring genes. A panel of five mouse noncoding mononucleotide markers was used for MSI classification of intestinal matched normal/tumor tissues from MMR-deficient (Mlh1(-/-) , Msh2(-/-) , Msh2(LoxP/LoxP) ) mice. cMNR frameshift mutations of candidate genes were determined by DNA fragment analysis. Murine MSI intestinal tumors but not normal tissues from MMR-deficient mice showed cMNR frameshift mutations in six candidate genes (Elavl3, Tmem107, Glis2, Sdccag1, Senp6, Rfc3). cMNRs of mouse Rfc3 and Elavl3 are conserved in type and length in their human orthologs that are known to be mutated in human MSI colorectal, endometrial and gastric cancer. We provide evidence for the utility of a mononucleotide marker panel for detection of MSI in murine tumors, the existence of cMNR instability in MSI murine tumors, the utility of mouse subspecies DNA for identification of polymorphic repeats, and repeat conservation among some orthologous human/mouse genes, two of them showing instability in human and mouse MSI intestinal tumors. MMR-deficient mice hence are a useful molecular model system for analyzing MSI intestinal carcinogenesis.

  5. Bovine dopamine beta-hydroxylase cDNA. Complete coding sequence and expression in mammalian cells with vaccinia virus vector.

    PubMed

    Lewis, E J; Allison, S; Fader, D; Claflin, V; Baizer, L

    1990-01-15

    We have isolated cDNA clones for bovine dopamine beta-hydroxylase from an adrenal medulla cDNA library and have determined the complete coding sequence. The largest cDNA clone isolated from the library is 2.4 kilobase pairs (kb) and contains an open reading frame of 1788 bases, coding for a protein of 597 amino acids and Mr = 66,803. The predicted amino acid sequence of the bovine cDNA contains 85% identity with human dopamine beta-hydroxylase (Lamouroux, A., Vingny, A., Faucon Biquet, N., Darmon, M. C., Franck, R., Henry, J.P., and Mallet, J. (1987) EMBO J. 6, 3931-3937; Kobayashi, K., Kurosawa, Y., Fujita, K., and Nagatsu, T. (1989) Nucleic Acids Res. 17, 1089-1102). Northern blot analysis reveals that the cDNA hybridizes to an mRNA of 2.4 kb present in bovine adrenal medulla, but not in kidney, heart, or liver. In addition, the cDNA hybridizes to a second RNA species of 5.5 kb, which is 4-fold less abundant than the 2.4-kb RNA. In vitro translation of a synthetic RNA transcribed from the 2.4-kb cDNA produces a 68-kDa protein, which is specifically immunoprecipitated by antiserum to bovine dopamine beta-hydroxylase. The 2.4-kb cDNA was cloned into a vaccinia virus vector, and the recombinant virus was used to infect the rat pheochromocytoma PC12 and monkey BSC-40 fibroblast cell lines. In both cell lines, infection with recombinant virus produces a protein of Mr = 75,000, which reacts with antiserum to bovine dopamine beta-hydroxylase. These results indicate that the 2.4-kb cDNA contains the genetic information necessary to code for the bovine dopamine beta-hydroxylase subunit.

  6. MALDI-TOF MS analysis of ribosomal proteins coded in S10 and spc operons rapidly classified the Sphingomonadaceae as alkylphenol polyethoxylate-degrading bacteria from the environment.

    PubMed

    Hotta, Yudai; Sato, Hiroaki; Hosoda, Akifumi; Tamura, Hiroto

    2012-05-01

    Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) using ribosomal subunit proteins coded in the S10-spc-alpha operon as biomarkers was applied for the classification of the Sphingomonadaceae from the environment. To construct a ribosomal protein database, S10-spc-alpha operon of type strains of the Sphingomonadaceae and their related alkylphenol polyethoxylate (APEO(n) )-degrading bacteria were sequenced using specific primers designed based on nucleotide sequences of genome-sequenced strains. The observed MALDI mass spectra of intact cells were compared with the theoretical mass of the constructed ribosomal protein database. The nine selected biomarkers coded in the S10-spc-alpha operon, L18, L22, L24, L29, L30, S08, S14, S17, and S19, could successfully distinguish the Sphingopyxis terrae NBRC 15098(T) and APEO(n) -degrading bacteria strain BSN20, despite only one base difference in the 16S rRNA gene sequence. This method, named the S10-GERMS (S10-spc-alpha operon gene-encoded ribosomal protein mass spectrum) method, is a significantly useful tool for bacterial discrimination of the Sphingomonadaceae at the strain level and can detect and monitor the main APEO(n) -degrading bacteria in the environment.

  7. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    SciTech Connect

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.; Goldberg, O.; Soreq, H.

    1987-06-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.

  8. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome

    PubMed Central

    Beh, Leslie Y.; Müller, Manuel M.; Muir, Tom W.; Kaplan, Noam; Landweber, Laura F.

    2015-01-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these “seed” nucleosomes—together with trans-acting factors—may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences. PMID:26330564

  9. DNA-guided establishment of nucleosome patterns within coding regions of a eukaryotic genome.

    PubMed

    Beh, Leslie Y; Müller, Manuel M; Muir, Tom W; Kaplan, Noam; Landweber, Laura F

    2015-11-01

    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these "seed" nucleosomes--together with trans-acting factors--may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences.

  10. GeneFizz: A web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives.

    PubMed

    Yeramian, Edouard; Jones, Louis

    2003-07-01

    The GeneFizz (http://pbga.pasteur.fr/GeneFizz) web tool permits the direct comparison between two types of segmentations for DNA sequences (possibly annotated): the coding/non-coding segmentation associated with genomic annotations (simple genes or exons in split genes) and the physics-based structural segmentation between helix and coil domains (as provided by the classical helix-coil model). There appears to be a varying degree of coincidence for different genomes between the two types of segmentations, from almost perfect to non-relevant. Following these two extremes, GeneFizz can be used for two purposes: ab initio physics-based identification of new genes (as recently shown for Plasmodium falciparum) or the exploration of possible evolutionary signals revealed by the discrepancies observed between the two types of information.

  11. Titanic's unknown child: the critical role of the mitochondrial DNA coding region in a re-identification effort.

    PubMed

    Just, Rebecca S; Loreille, Odile M; Molto, J Eldon; Merriwether, D Andrew; Woodward, Scott R; Matheson, Carney; Creed, Jennifer; McGrath, Stacey E; Sturk-Andreaggi, Kimberly; Coble, Michael D; Irwin, Jodi A; Ruffman, Alan; Parr, Ryan L

    2011-06-01

    This report describes a re-examination of the remains of a young male child recovered in the Northwest Atlantic following the loss of the Royal Mail Ship Titanic in 1912 and buried as an unknown in Halifax, Nova Scotia shortly thereafter. Following exhumation of the grave in 2001, mitochondrial DNA (mtDNA) hypervariable region 1 sequencing and odontological examination of the extremely limited skeletal remains resulted in the identification of the child as Eino Viljami Panula, a 13-month-old Finnish boy. This paper details recent and more extensive mitochondrial genome analyses that indicate the remains are instead most likely those of an English child, Sidney Leslie Goodwin. The case demonstrates the benefit of targeted mtDNA coding region typing in difficult forensic cases, and highlights the need for entire mtDNA sequence databases appropriate for forensic use.

  12. URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit

    NASA Astrophysics Data System (ADS)

    Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe

    1986-10-01

    The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.

  13. 5' coding region of the follicular epithelium yolk polypeptide 2 cDNA in the moth, Plodia interpunctella, contains an extended coding region.

    PubMed

    Shirk, P D; Perera, O P

    1998-01-01

    The 5' region of YP2 cDNA, a follicular epithelium yolk protein subunit in the moth, Plodia interpunctella, shows that the polypeptide contains an extended internal coding region. Partial cDNA clones for YP2 were isolated from a pharate adult female ovarian cDNA expression library in Lambda Zap II by screening with antigen selected YP2 antiserum. The 5' sequence of the YP2 transcript was determined by 5' RACE PCR of ovarian mRNA using YP2 sequence-specific nested primers. The combined cDNA and 5' RACE sequencing showed the YP2 transcript to be 1971 bp in length up to the poly(A) tail with a single open reading frame for a predicted polypeptide of 616 amino acids. Northern analysis showed a single YP2 transcript to be present in ovarian RNA that was approximately 2 kb in length. The predicted amino acid sequence for YP2 from P. interpunctella is most closely related to egg specific protein (ESP) from Bombyx mori and the partial YP2 sequence from Galleria mellonella. YP2 from P. interpunctella also is similar to vertebrate lipases and contains a conserved lipid binding region. However, the 5' coding region of YP2 from P. interpunctella contains an in-frame insert of approximately 438 bp that had replaced an approximately 270-bp region as compared with ESP from B. mori and YP2 of G. mellonella. This suggests that the insert occurred by a recombinational event internal to the YP2 structural gene of P. interpunctella.

  14. Natural selection on coding and noncoding DNA sequences is associated with virulence genes in a plant pathogenic fungus.

    PubMed

    Rech, Gabriel E; Sanz-Martín, José M; Anisimova, Maria; Sukno, Serenella A; Thon, Michael R

    2014-09-04

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.

  15. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  16. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    PubMed

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction. PMID:27046906

  17. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

    PubMed

    Kawano, Tomonori

    2013-03-01

    There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.

  18. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    NASA Astrophysics Data System (ADS)

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  19. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    PubMed Central

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-01-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity. PMID:26472689

  20. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology.

    PubMed

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-16

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  1. Functional validation of mouse tyrosinase non-coding regulatory DNA elements by CRISPR–Cas9-mediated mutagenesis

    PubMed Central

    Seruggia, Davide; Fernández, Almudena; Cantero, Marta; Pelczar, Pawel; Montoliu, Lluis

    2015-01-01

    Newly developed genome-editing tools, such as the clustered regularly interspaced short palindromic repeat (CRISPR)–Cas9 system, allow simple and rapid genetic modification in most model organisms and human cell lines. Here, we report the production and analysis of mice carrying the inactivation via deletion of a genomic insulator, a key non-coding regulatory DNA element found 5′ upstream of the mouse tyrosinase (Tyr) gene. Targeting sequences flanking this boundary in mouse fertilized eggs resulted in the efficient deletion or inversion of large intervening DNA fragments delineated by the RNA guides. The resulting genome-edited mice showed a dramatic decrease in Tyr gene expression as inferred from the evident decrease of coat pigmentation, thus supporting the functionality of this boundary sequence in vivo, at the endogenous locus. Several potential off-targets bearing sequence similarity with each of the two RNA guides used were analyzed and found to be largely intact. This study reports how non-coding DNA elements, even if located in repeat-rich genomic sequences, can be efficiently and functionally evaluated in vivo and, furthermore, it illustrates how the regulatory elements described by the ENCODE and EPIGENOME projects, in the mouse and human genomes, can be systematically validated. PMID:25897126

  2. The vicilin gene family of pea (Pisum sativum L.): a complete cDNA coding sequence for preprovicilin.

    PubMed Central

    Lycett, G W; Delauney, A J; Gatehouse, J A; Gilroy, J; Croy, R R; Boulter, D

    1983-01-01

    A cDNA plasmid bank has been constructed using mRNA from developing pea seeds and three cDNAs coding for vicilin polypeptides have been selected. These cDNAs have been sequenced and between them cover the whole of the coding sequence plus part of the 5' and 3' untranslated regions. Comparison with amino acid sequence data from the protein indicates that vicilin is synthesised as preprovicilin with subsequent removal of a signal peptide and a C-terminal peptide as well as post translational endo-proteolytic cleavage. The cDNAs represent two different classes of vicilin genes whilst amino acid data show that there are at least three major classes of vicilin polypeptide. The vicilin sequences show extensive homology with conglycinin and phaseolin except in the regions of the internal proteolytic cleavages. The evolutionary significance of this relationship is discussed. Images PMID:6687941

  3. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

    PubMed

    Kabir, Muhammad; Hayat, Maqsood

    2016-02-01

    Meiotic recombination is vital for maintaining the sequence diversity in human genome. Meiosis and recombination are considered the essential phases of cell division. In meiosis, the genome is divided into equal parts for sexual reproduction whereas in recombination, the diverse genomes are combined to form new combination of genetic variations. Recombination process does not occur randomly across the genomes, it targets specific areas called recombination "hotspots" and "coldspots". Owing to huge exploration of polygenetic sequences in data banks, it is impossible to recognize the sequences through conventional methods. Looking at the significance of recombination spots, it is indispensable to develop an accurate, fast, robust, and high-throughput automated computational model. In this model, the numerical descriptors are extracted using two sequence representation schemes namely: dinucleotide composition and trinucleotide composition. The performances of seven classification algorithms were investigated. Finally, the predicted outcomes of individual classifiers are fused to form ensemble classification, which is formed through majority voting and genetic algorithm (GA). The performance of GA-based ensemble model is quite promising compared to individual classifiers and majority voting-based ensemble model. iRSpot-GAEnsC has achieved 84.46 % accuracy. The empirical results revealed that the performance of iRSpot-GAEnsC is not only higher than the examined algorithms but also better than existing methods in the literature developed so far. It is anticipated that the proposed model might be helpful for research community, academia and for drug discovery.

  4. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation.

    PubMed

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-10-30

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis.

  5. The TL-DNA in octopine crown-gall tumours codes for seven well-defined polyadenylated transcripts

    PubMed Central

    Willmitzer, Lothar; Simons, Gisela; Schell, Jeff

    1982-01-01

    Seven polyadenylated transcripts of significantly different relative abundance were detected in octopine crown-gall tissue after gel electrophoretic separation and subsequent transfer to diazobenzyloxymethyl paper. The transcripts range from 670 to 2700 bases long. The different transcripts were located using 19 different fragments of the TL-region as probes. By hybridizing labelled RNA to separated complementary strands of the T-DNA, and parallel determination of the chemical polarity of the strands, the 5' - 3' orientations of six of the seven transcripts was identified. Both strands of the T-DNA code RNA. Hybridization of octopine TL-DNA against poly A+ RNA's present in two nopaline tumour-lines C58-S1 and BT37, and vice versa, reveals a minimum of two and possibly four transcripts common to both octopine and nopaline tumours. These transcripts originate from corresponding parts of the conserved region of the T-DNA and are of similar size. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4. PMID:16453403

  6. A novel non-coding RNA lncRNA-JADE connects DNA damage signalling to histone H4 acetylation

    PubMed Central

    Wan, Guohui; Hu, Xiaoxiao; Liu, Yunhua; Han, Cecil; Sood, Anil K; Calin, George A; Zhang, Xinna; Lu, Xiongbin

    2013-01-01

    A prompt and efficient DNA damage response (DDR) eliminates the detrimental effects of DNA lesions in eukaryotic cells. Basic and preclinical studies suggest that the DDR is one of the primary anti-cancer barriers during tumorigenesis. The DDR involves a complex network of processes that detect and repair DNA damage, in which long non-coding RNAs (lncRNAs), a new class of regulatory RNAs, may play an important role. In the current study, we identified a novel lncRNA, lncRNA-JADE, that is induced after DNA damage in an ataxia-telangiectasia mutated (ATM)-dependent manner. LncRNA-JADE transcriptionally activates Jade1, a key component in the HBO1 (human acetylase binding to ORC1) histone acetylation complex. Consequently, lncRNA-JADE induces histone H4 acetylation in the DDR. Markedly higher levels of lncRNA-JADE were observed in human breast tumours in comparison with normal breast tissues. Knockdown of lncRNA-JADE significantly inhibited breast tumour growth in vivo. On the basis of these results, we propose that lncRNA-JADE is a key functional link that connects the DDR to histone H4 acetylation, and that dysregulation of lncRNA-JADE may contribute to breast tumorigenesis. PMID:24097061

  7. Coding of DNA samples and data in the pharmaceutical industry: current practices and future directions--perspective of the I-PWG.

    PubMed

    Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A

    2011-04-01

    DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.

  8. Isolation and sequencing of a cDNA coding for the human DF3 breast carcinoma-associated antigen

    SciTech Connect

    Siddiqui, J.; Abe, M.; Hayes, D.; Shani, E.; Yunis, E.; Kufe, D. )

    1988-04-01

    The murine monoclonal antibody (mAb) DF3 reacts with a high molecular weight glycoprotein detectable in human breast carcinomas. DF3 antigen expression correlates with human breast tumor differentiation, and the detection of a cross-reactive species in human milk has suggested that this antigen might be useful as a marker of differentiated mammary epithelium. To further characterize DF3 antigen expression, the authors have isolated a cDNA clone from a {lambda}gt11 library by screening with mAb DF3. The results demonstrate that this 309-base-pair cDNA, designated pDF9.3, codes for the DF3 epitope. Southern blot analyses of EcoRI-digested DNAs from six human tumor cell lines with {sup 32}P-labeled pDF9.3 have revealed a restriction fragment length polymorphism. Variations in size of the alleles detected by pDF9.3 were also identified in Pst I, but not in HindIII, DNA digests. Furthermore, hybridization of {sup 32}P-labeled pDF9.3 with total cellular RNA from each of these cell lines demonstrated either one or two transcripts that varied from 4.1 to 7.1 kilobases in size. The presence of differently sized transcripts detected by pDF9.3 was also found to correspond with the polymorphic expression of DF3 glycoproteins. Nucleotide sequence analysis of pDF9.3 has revealed a highly conserved (G + C)-rich 60-base-pair tandem repeat. These findings suggest that the variation in size of alleles coding for the polymorphic DF3 glycoprotein may represent different numbers of repeats.

  9. An Abundant Class of Non-coding DNA Can Prevent Stochastic Gene Silencing in the C. elegans Germline.

    PubMed

    Frøkjær-Jensen, Christian; Jain, Nimit; Hansen, Loren; Davis, M Wayne; Li, Yongbin; Zhao, Di; Rebora, Karine; Millet, Jonathan R M; Liu, Xiao; Kim, Stuart K; Dupuy, Denis; Jorgensen, Erik M; Fire, Andrew Z

    2016-07-14

    Cells benefit from silencing foreign genetic elements but must simultaneously avoid inactivating endogenous genes. Although chromatin modifications and RNAs contribute to maintenance of silenced states, the establishment of silenced regions will inevitably reflect underlying DNA sequence and/or structure. Here, we demonstrate that a pervasive non-coding DNA feature in Caenorhabditis elegans, characterized by 10-base pair periodic An/Tn-clusters (PATCs), can license transgenes for germline expression within repressive chromatin domains. Transgenes containing natural or synthetic PATCs are resistant to position effect variegation and stochastic silencing in the germline. Among endogenous genes, intron length and PATC-character undergo dramatic changes as orthologs move from active to repressive chromatin over evolutionary time, indicating a dynamic character to the An/Tn periodicity. We propose that PATCs form the basis of a cellular immune system, identifying certain endogenous genes in heterochromatic contexts as privileged while foreign DNA can be suppressed with no requirement for a cellular memory of prior exposure. PMID:27374334

  10. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  11. Codon usage, genetic code and phylogeny of Dictyostelium discoideum mitochondrial DNA as deduced from a 7.3-kb region.

    PubMed

    Angata, K; Kuroe, K; Yanagisawa, K; Tanaka, Y

    1995-02-01

    We have sequenced a region (7,376-bp) of the mitochondrial (mt) DNA (54 kb) of the cellular slime mold, Dictyostelium discoideum. From the DNA and amino-acid sequence comparisons with known sequences, genes for ATPase subunit 9 (ATP9), cytochrome b (CYTB), NADH dehydrogenase subunits 1, 3 and 6 (ND1, ND3 and ND6), small subunit rRNA (SSU rRNA) and seven tRNAs (Arg, Asn, Cys, Lys, f-Met, Met and Pro) have been identified. The sequenced region of the mtDNA has a high average A + T-content (70.8%). The A + T-content of protein-genes (73.6%) is considerably higher than that of RNA genes (61.3%). Even with the strong AT-bias, the genetic code employed is most probably the universal one. All seven tRNAs are able to form typical clover leaf structures. The molecular phylogenetic trees of CYTB and SSU rRNA suggest that D. discoideum is closer to green plants than to animals and fungi. PMID:7736610

  12. Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

    PubMed Central

    Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

    2014-01-01

    Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.

  13. Unravelling the hidden DNA structural/physical code provides novel insights on promoter location.

    PubMed

    Durán, Elisa; Djebali, Sarah; González, Santi; Flores, Oscar; Mercader, Josep Maria; Guigó, Roderic; Torrents, David; Soler-López, Montserrat; Orozco, Modesto

    2013-08-01

    Although protein recognition of DNA motifs in promoter regions has been traditionally considered as a critical regulatory element in transcription, the location of promoters, and in particular transcription start sites (TSSs), still remains a challenge. Here we perform a comprehensive analysis of putative core promoter sequences relative to non-annotated predicted TSSs along the human genome, which were defined by distinct DNA physical properties implemented in our ProStar computational algorithm. A representative sampling of predicted regions was subjected to extensive experimental validation and analyses. Interestingly, the vast majority proved to be transcriptionally active despite the lack of specific sequence motifs, indicating that physical signaling is indeed able to detect promoter activity beyond conventional TSS prediction methods. Furthermore, highly active regions displayed typical chromatin features associated to promoters of housekeeping genes. Our results enable to redefine the promoter signatures and analyze the diversity, evolutionary conservation and dynamic regulation of human core promoters at large-scale. Moreover, the present study strongly supports the hypothesis of an ancient regulatory mechanism encoded by the intrinsic physical properties of the DNA that may contribute to the complexity of transcription regulation in the human genome. PMID:23761436

  14. African swine fever virus ORF P1192R codes for a functional type II DNA topoisomerase.

    PubMed

    Coelho, João; Martins, Carlos; Ferreira, Fernando; Leitão, Alexandre

    2015-01-01

    Topoisomerases modulate the topological state of DNA during processes, such as replication and transcription, that cause overwinding and/or underwinding of the DNA. African swine fever virus (ASFV) is a nucleo-cytoplasmic double-stranded DNA virus shown to contain an OFR (P1192R) with homology to type II topoisomerases. Here we observed that pP1192R is highly conserved among ASFV isolates but dissimilar from other viral, prokaryotic or eukaryotic type II topoisomerases. In both ASFV/Ba71V-infected Vero cells and ASFV/L60-infected pig macrophages we detected pP1192R at intermediate and late phases of infection, cytoplasmically localized and accumulating in the viral factories. Finally, we used a Saccharomyces cerevisiae temperature-sensitive strain in order to demonstrate, through complementation and in vitro decatenation assays, the functionality of P1192R, which we further confirmed by mutating its predicted catalytic residue. Overall, this work strengthens the idea that P1192R constitutes a target for studying, and possibly controlling, ASFV transcription and replication.

  15. HGSA DNA day essay contest winner 60 years on: still coding for cutting-edge science.

    PubMed

    Yates, Patrick

    2013-08-01

    MESSAGE FROM THE EDUCATION COMMITTEE: In 2013, the Education Committee of the Human Genetics Society of Australasia (HGSA) established the DNA Day Essay Contest in Australia and New Zealand. The contest was first established by the American Society of Human Genetics in 2005 and the HGSA DNA Day Essay Contest is adapted from this contest via a collaborative partnership. The aim of the contest is to engage high school students with important concepts in genetics through literature research and reflection. As 2013 marks the 60th anniversary of the discovery of the double helix of DNA by James Watson and Francis Crick and the 10th anniversary of the first sequencing of the human genome, the essay topic was to choose either of these breakthroughs and explain its broader impact on biotechnology, human health and disease, or our understanding of basic genetics, such as genetic variation or gene expression. The contest attracted 87 entrants in 2013, with the winning essay authored by Patrick Yates, a Year 12 student from Melbourne High School. Further details about the contest including the names and schools of the other finalists can be found at http://www.hgsa-essay.net.au/. The Education Committee would like to thank all the 2013 applicants and encourage students to enter in 2014.

  16. Fine-tuning the ubiquitin code at DNA double-strand breaks: deubiquitinating enzymes at work

    PubMed Central

    Citterio, Elisabetta

    2015-01-01

    Ubiquitination is a reversible protein modification broadly implicated in cellular functions. Signaling processes mediated by ubiquitin (ub) are crucial for the cellular response to DNA double-strand breaks (DSBs), one of the most dangerous types of DNA lesions. In particular, the DSB response critically relies on active ubiquitination by the RNF8 and RNF168 ub ligases at the chromatin, which is essential for proper DSB signaling and repair. How this pathway is fine-tuned and what the functional consequences are of its deregulation for genome integrity and tissue homeostasis are subject of intense investigation. One important regulatory mechanism is by reversal of substrate ubiquitination through the activity of specific deubiquitinating enzymes (DUBs), as supported by the implication of a growing number of DUBs in DNA damage response processes. Here, we discuss the current knowledge of how ub-mediated signaling at DSBs is controlled by DUBs, with main focus on DUBs targeting histone H2A and on their recent implication in stem cell biology and cancer. PMID:26442100

  17. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system

    PubMed Central

    Kawano, Tomonori

    2013-01-01

    There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303

  18. Brut: Automatic bubble classifier

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, Alyssa; Williams, Jonathan; Kendrew, Sarah; Simpson, Robert

    2014-07-01

    Brut, written in Python, identifies bubbles in infrared images of the Galactic midplane; it uses a database of known bubbles from the Milky Way Project and Spitzer images to build an automatic bubble classifier. The classifier is based on the Random Forest algorithm, and uses the WiseRF implementation of this algorithm.

  19. Crystal structure of T4-lysozyme generated from synthetic coding DNA expressed in Escherichia coli.

    PubMed

    Rose, D R; Phipps, J; Michniewicz, J; Birnbaum, G I; Ahmed, F R; Muir, A; Anderson, W F; Narang, S

    1988-10-01

    The polypeptide produced by expressing a chemically synthesized gene coding for the amino-acid sequence of T4-lysozyme has been crystallized and subjected to X-ray diffraction. The crystal structure has been refined to a standard R-factor of 0.191 for data between 8 and 2 A resolution. The refined model is essentially the same as the well-known structure of wild-type T4-lysozyme determined previously by Matthews et al. (1987). Some small changes in the C-terminal region, which is important in maintaining the folded structure, have been noted. In addition to confirming that the synthetic gene product is very close to the wild type, this structure provides a benchmark for protein engineering experiments on the folding and the catalytic activity of this molecule by the method of gene synthesis.

  20. Balbiani ring DNA: sequence comparisons and evolutionary history of a family of hierarchically repetitive protein-coding genes.

    PubMed

    Pustell, J; Kafatos, F C; Wobus, U; Bäumlein, H

    1984-01-01

    All known types of Balbiani ring (BR) genes consist of multiple, tandemly arranged, ca. 180 to 300-bp repeat units that can be divided into a constant region and a subrepeat region. The latter region includes short tandem subrepeats (SRs). Comparison of all available BR sequences using computer methods has enabled us (a) to define more precisely the constant and subrepeat regions, (b) to infer the evolutionary relationships among the various types of BR repeats, (c) to derive a consensus approximation of an ancestral sequence from a small segment of which the highly diverse present-day SRs may have originated, and (d) to detect an underlying substructure in the constant region, evident in the consensus but not in the present-day sequences and possibly corresponding to an original 39-bp DNA segment from which the extant, giant BR sequences may have evolved. We discuss the processes of reduplication, diversification, and homogenization within the hierarchically repetitive BR sequences as examples of how a simple DNA element may evolve into a diverse family of large, protein-coding genes.

  1. First approximation of a stereochemical rationale for the genetic code based on the topography and physicochemical properties of "cavities" constructed from models of DNA.

    PubMed Central

    Hendry, L B; Bransome, E D; Hutson, M S; Campbell, L K

    1981-01-01

    To examine the question of whether or not the genetic code has a stereochemical basis, we used artificial constructs of the topography and physicochemical features of unique "cavities" formed by removal of the second codon base in B-DNA. The effects of base changes on the stereochemistry of the cavities are consistent with the pattern of the genetic code. Fits into the cavities of the side chains of the 20 L amino acids involved in protein synthesis can be demonstrated by using conventional physicochemical principles of hydrogen bonding and steric constraints. The specificity of the fits is remarkably consistent with the genetic code. Images PMID:6950386

  2. A phylogeny of the extant Phocidae inferred from complete mitochondrial DNA coding regions.

    PubMed

    Davis, Corey S; Delisle, Isabelle; Stirling, Ian; Siniff, Donald B; Strobeck, Curtis

    2004-11-01

    Despite extensive interest in the systematics of Pinnipedia, questions remain concerning phylogenetic relationships within the Phocidae or "true" seals. Relationships within the phocids and their placement relative to the remaining pinnipeds and major lineages of arctoid carnivores were examined using a large molecular data set consisting of 12 mitochondrial protein coding genes. Phylogenetic analysis including 15 extant species of the Phocidae, and representatives of the Otariidae, Odobenidae, Ursidae, Mustelidae, Canidae, and Felidae confirmed the monophyletic origins of the Pinnipedia within the Arctoidea. Slightly more support was found for an ursid affinity of the pinnipeds, however, this relationship remains contentious. The Phocidae were placed as the sister group to a common odobenid-otariid clade. Within the family Phocidae, strong support for the traditionally accepted subfamilies Phocinae (northern seals), and Monachinae (southern seals plus monk seals) was found. In contrast to recent suggestions, a monophyletic Monachus was strongly supported and was placed in a deep branching position within the Monachinae. Evidence from sequence divergence under a maximum likelihood model illustrated that the rarely used tribal distinction within the Monachinae are comparable, in terms of evolutionary distance, to accepted tribal distinctions within the Phocinae. In addition, results suggest that Pagophilus should be accepted as a genus within the Phocini. Sequence divergence between Phoca, Pusa, and Halichoerus is minimal, supporting a taxonomic reclassification of the three genera into an emended genus Phoca, without subgeneric distinctions. PMID:15336671

  3. Variable continental distribution of polymorphisms in the coding regions of DNA-repair genes.

    PubMed

    Mathonnet, Géraldine; Labuda, Damian; Meloche, Caroline; Wambach, Tina; Krajinovic, Maja; Sinnett, Daniel

    2003-01-01

    DNA-repair pathways are critical for maintaining the integrity of the genetic material by protecting against mutations due to exposure-induced damages or replication errors. Polymorphisms in the corresponding genes may be relevant in genetic epidemiology by modifying individual cancer susceptibility or therapeutic response. We report data on the population distribution of potentially functional variants in XRCC1, APEX1, ERCC2, ERCC4, hMLH1, and hMSH3 genes among groups representing individuals of European, Middle Eastern, African, Southeast Asian and North American descent. The data indicate little interpopulation differentiation in some of these polymorphisms and typical FST values ranging from 10 to 17% at others. Low FST was observed in APEX1 and hMSH3 exon 23 in spite of their relatively high minor allele frequencies, which could suggest the effect of balancing selection. In XRCC1, hMSH3 exon 21 and hMLH1 Africa clusters either with Middle East and Europe or with Southeast Asia, which could be related to the demographic history of human populations, whereby human migrations and genetic drift rather than selection would account for the observed differences.

  4. Dynamic system classifier

    NASA Astrophysics Data System (ADS)

    Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A.

    2016-07-01

    Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω (t ) and damping factor γ (t ) . Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.

  5. Dynamic system classifier.

    PubMed

    Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A

    2016-07-01

    Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω(t) and damping factor γ(t). Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.

  6. Dynamic system classifier.

    PubMed

    Pumpe, Daniel; Greiner, Maksim; Müller, Ewald; Enßlin, Torsten A

    2016-07-01

    Stochastic differential equations describe well many physical, biological, and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time-dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definiteness we restrict the presentation of the DSC to oscillation processes with a time-dependent frequency ω(t) and damping factor γ(t). Although real systems might be more complex, this simple oscillator captures many characteristic features. The ω and γ time lines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime. PMID:27575101

  7. DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids.

    PubMed

    Hafeez, Ibbad; Khan, Asifullah; Qadir, Abdul

    2014-11-01

    Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://111.68.99.218/DNA-LCEB.

  8. Recognition Using Hybrid Classifiers.

    PubMed

    Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

    2016-04-01

    A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.

  9. A pathogenic non-coding RNA induces changes in dynamic DNA methylation of ribosomal RNA genes in host plants

    PubMed Central

    Martinez, German; Castellano, Mayte; Tortosa, Maria; Pallas, Vicente; Gomez, Gustavo

    2014-01-01

    Viroids are plant-pathogenic non-coding RNAs able to interfere with as yet poorly known host-regulatory pathways and to cause alterations recognized as diseases. The way in which these RNAs coerce the host to express symptoms remains to be totally deciphered. In recent years, diverse studies have proposed a close interplay between viroid-induced pathogenesis and RNA silencing, supporting the belief that viroid-derived small RNAs mediate the post-transcriptional cleavage of endogenous mRNAs by acting as elicitors of symptoms expression. Although the evidence supporting the role of viroid-derived small RNAs in pathogenesis is robust, the possibility that this phenomenon can be a more complex process, also involving viroid-induced alterations in plant gene expression at transcriptional levels, has been considered. Here we show that plants infected with the ‘Hop stunt viroid’ accumulate high levels of sRNAs derived from ribosomal transcripts. This effect was correlated with an increase in the transcription of ribosomal RNA (rRNA) precursors during infection. We observed that the transcriptional reactivation of rRNA genes correlates with a modification of DNA methylation in their promoter region and revealed that some rRNA genes are demethylated and transcriptionally reactivated during infection. This study reports a previously unknown mechanism associated with viroid (or any other pathogenic RNA) infection in plants providing new insights into aspects of host alterations induced by the viroid infectious cycle. PMID:24178032

  10. A non-coding plastid DNA phylogeny of Asian Begonia (Begoniaceae): evidence for morphological homoplasy and sectional polyphyly.

    PubMed

    Thomas, D C; Hughes, M; Phutthai, T; Rajbhandary, S; Rubite, R; Ardi, W H; Richardson, J E

    2011-09-01

    Maximum likelihood and Bayesian analyses of non-coding plastid DNA sequence data based on a broad sampling of all major Asian Begonia sections (ndhA intron, ndhF-rpl32 spacer, rpl32-trnL spacer, 3977 aligned characters, 84 species) were used to reconstruct the phylogeny of Asian Begonia and to test the monophyly of major Asian Begonia sections. Ovary and fruit characters which are crucial in current sectional circumscriptions were mapped on the phylogeny to assess their utility in infrageneric classifications. The results indicate that the strong systematic emphasis placed on single, homoplasious characters such as undivided placenta lamellae (section Reichenheimia) and fleshy pericarps (section Sphenanthera), and the recognition of sections primarily based on a suite of plesiomorphic characters including three-locular ovaries with axillary, bilamellate placentae and dry, dehiscent pericarps (section Diploclinium), has resulted in the circumscription of several polyphyletic sections. Moreover, sections Platycentrum and Petermannia were recovered as paraphyletic. Because of the homoplasy of systematically important characters, current classifications have a certain diagnostic, but only poor predictive value. The presented phylogeny provides for the first time a reasonably resolved and supported phylogenetic framework for Asian Begonia which has the power to inform future taxonomic, biogeographic and evolutionary studies.

  11. Evolutionary Conservation of a Coding Function for D4Z4, the Tandem DNA Repeat Mutated in Facioscapulohumeral Muscular Dystrophy

    PubMed Central

    Clapp, Jannine ; Mitchell, Laura M. ; Bolland, Daniel J. ; Fantes, Judy ; Corcoran, Anne E. ; Scotting, Paul J. ; Armour, John A. L. ; Hewitt, Jane E. 

    2007-01-01

    Facioscapulohumeral muscular dystrophy (FSHD) is caused by deletions within the polymorphic DNA tandem array D4Z4. Each D4Z4 repeat unit has an open reading frame (ORF), termed “DUX4,” containing two homeobox sequences. Because there has been no evidence of a transcript from the array, these deletions are thought to cause FSHD by a position effect on other genes. Here, we identify D4Z4 homologues in the genomes of rodents, Afrotheria (superorder of elephants and related species), and other species and show that the DUX4 ORF is conserved. Phylogenetic analysis suggests that primate and Afrotherian D4Z4 arrays are orthologous and originated from a retrotransposed copy of an intron-containing DUX gene, DUXC. Reverse-transcriptase polymerase chain reaction and RNA fluorescence and tissue in situ hybridization data indicate transcription of the mouse array. Together with the conservation of the DUX4 ORF for >100 million years, this strongly supports a coding function for D4Z4 and necessitates re-examination of current models of the FSHD disease mechanism. PMID:17668377

  12. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  13. cDNA sequence coding for the alpha'-chain of the third complement component in the African lungfish.

    PubMed

    Sato, A; Sültmann, H; Mayer, W E; Figueroa, F; Tichy, H; Klein, J

    1999-04-01

    cDNA clones coding for almost the entire C3 alpha-chain of the African lungfish (Protopterus aethiopicus), a representative of the Sarcopterygii (lobe-finned fishes), were sequenced and characterized. From the sequence it is deduced that the lungfish C3 molecule is probably a disulphide-bonded alpha:beta dimer similar to that of the C3 components of other jawed vertebrates. The deduced sequence contains conserved sites presumably recognized by proteolytic enzymes (e.g. factor I) involved in the activation and inactivation of the component. It also contains the conserved thioester region and the putative site for binding properdin. However, the site for the interaction with complement receptor 2 and factor H are poorly conserved. Either complement receptor 2 and factor H are not present in the lungfish or they bind to different residues at the same or a different site than mammalian complement receptor 2 and factor H. The C3 alpha-chain sequences faithfully reflect the phylogenetic relationships among vertebrate classes and can therefore be used to help to resolve the long-standing controversy concerning the origin of the tetrapods. PMID:10219761

  14. Classifying Cereal Data

    Cancer.gov

    The DSQ includes questions about cereal intake and allows respondents up to two responses on which cereals they consume. We classified each cereal reported first by hot or cold, and then along four dimensions: density of added sugars, whole grains, fiber, and calcium.

  15. Classifying Adolescent Perfectionists

    ERIC Educational Resources Information Center

    Rice, Kenneth G.; Ashby, Jeffrey S.; Gilman, Rich

    2011-01-01

    A large school-based sample of 9th-grade adolescents (N = 875) completed the Almost Perfect Scale-Revised (APS-R; Slaney, Mobley, Trippi, Ashby, & Johnson, 1996). Decision rules and cut-scores were developed and replicated that classify adolescents as one of two kinds of perfectionists (adaptive or maladaptive) or as nonperfectionists. A…

  16. Number in Classifier Languages

    ERIC Educational Resources Information Center

    Nomoto, Hiroki

    2013-01-01

    Classifier languages are often described as lacking genuine number morphology and treating all common nouns, including those conceptually count, as an unindividuated mass. This study argues that neither of these popular assumptions is true, and presents new generalizations and analyses gained by abandoning them. I claim that no difference exists…

  17. Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

    PubMed

    Qiu, Guo-Hua

    2016-01-01

    In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense.

  18. New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

    PubMed Central

    Černý, Viktor; Carracedo, Ángel

    2011-01-01

    Background Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. Methodology/Principal Findings Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. Conclusions/Significance Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mt

  19. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available.

  20. Nucleotide and derived amino acid sequences of a cDNA coding for pre-uteroglobin from the lung of the hare (Lepus capensis).

    PubMed Central

    López de Haro, M S; Nieto, A

    1986-01-01

    An almost full-length cDNA coding for pre-uteroglobin from hare lung was cloned and sequenced. The derived amino acid sequence indicated that hare pre-uteroglobin contained 91 amino acids, including a signal peptide of 21 residues. Comparison of the nucleotide sequence of hare pre-uteroglobin cDNA with that previously reported for the rabbit gene indicated five silent point substitutions and six others leading to amino acid changes in the coding region. The untranslated regions of both pre-uteroglobin mRNAs were very similar. The amino acid changes observed are discussed in relation to the different progesterone-binding abilities of both homologous proteins. PMID:3019311

  1. Screening for Functional Non-coding Genetic Variants Using Electrophoretic Mobility Shift Assay (EMSA) and DNA-affinity Precipitation Assay (DAPA).

    PubMed

    Miller, Daniel E; Patel, Zubin H; Lu, Xiaoming; Lynch, Arthur T; Weirauch, Matthew T; Kottyan, Leah C

    2016-01-01

    Population and family-based genetic studies typically result in the identification of genetic variants that are statistically associated with a clinical disease or phenotype. For many diseases and traits, most variants are non-coding, and are thus likely to act by impacting subtle, comparatively hard to predict mechanisms controlling gene expression. Here, we describe a general strategic approach to prioritize non-coding variants, and screen them for their function. This approach involves computational prioritization using functional genomic databases followed by experimental analysis of differential binding of transcription factors (TFs) to risk and non-risk alleles. For both electrophoretic mobility shift assay (EMSA) and DNA affinity precipitation assay (DAPA) analysis of genetic variants, a synthetic DNA oligonucleotide (oligo) is used to identify factors in the nuclear lysate of disease or phenotype-relevant cells. For EMSA, the oligonucleotides with or without bound nuclear factors (often TFs) are analyzed by non-denaturing electrophoresis on a tris-borate-EDTA (TBE) polyacrylamide gel. For DAPA, the oligonucleotides are bound to a magnetic column and the nuclear factors that specifically bind the DNA sequence are eluted and analyzed through mass spectrometry or with a reducing sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) followed by Western blot analysis. This general approach can be widely used to study the function of non-coding genetic variants associated with any disease, trait, or phenotype. PMID:27585267

  2. Reduced-Median-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences for the Major African, Asian, and European Haplogroups

    PubMed Central

    Herrnstadt, Corinna; Elson, Joanna L.; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M.; Anderson, Christen; Ghosh, Soumitra S.; Olefsky, Jerrold M.; Beal, M. Flint; Davis, Robert E.; Howell, Neil

    2002-01-01

    The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here. PMID:11938495

  3. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. PMID:26844917

  4. High Performance Medical Classifiers

    NASA Astrophysics Data System (ADS)

    Fountoukis, S. G.; Bekakos, M. P.

    2009-08-01

    In this paper, parallelism methodologies for the mapping of machine learning algorithms derived rules on both software and hardware are investigated. Feeding the input of these algorithms with patient diseases data, medical diagnostic decision trees and their corresponding rules are outputted. These rules can be mapped on multithreaded object oriented programs and hardware chips. The programs can simulate the working of the chips and can exhibit the inherent parallelism of the chips design. The circuit of a chip can consist of many blocks, which are operating concurrently for various parts of the whole circuit. Threads and inter-thread communication can be used to simulate the blocks of the chips and the combination of block output signals. The chips and the corresponding parallel programs constitute medical classifiers, which can classify new patient instances. Measures taken from the patients can be fed both into chips and parallel programs and can be recognized according to the classification rules incorporated in the chips and the programs design. The chips and the programs constitute medical decision support systems and can be incorporated into portable micro devices, assisting physicians in their everyday diagnostic practice.

  5. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  6. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  7. A sandwich-hybridization assay for simultaneous determination of HIV and tuberculosis DNA targets based on signal amplification by quantum dots-PowerVision™ polymer coding nanotracers.

    PubMed

    Yan, Zhongdan; Gan, Ning; Zhang, Huairong; Wang, De; Qiao, Li; Cao, Yuting; Li, Tianhua; Hu, Futao

    2015-09-15

    A novel sandwich-hybridization assay for simultaneous electrochemical detection of multiple DNA targets related to human immune deficiency virus (HIV) and tuberculosis (TB) was developed based on the different quantum dots-PowerVision(TM) polymer nanotracers. The polymer nanotracers were respectively fabricated by immobilizing SH-labeled oligonucleotides (s-HIV or s-TB), which can partially hybrid with virus DNA (HIV or TB), on gold nanoparticles (Au NPs) and then modified with PowerVision(TM) (PV) polymer-encapsulated quantum dots (CdS or PbS) as signal tags. PV is a dendrimer enzyme linked polymer, which can immobilize abundant QDs to amplify the stripping voltammetry signals from the metal ions (Pb or Cd). The capture probes were prepared through the immobilization of SH-labeled oligonucleotides, which can complementary with HIV and TB DNA, on the magnetic Fe3O4@Au (GMPs) beads. After sandwich-hybridization, the polymer nanotracers together with HIV and TB DNA targets were simultaneously introduced onto the surface of GMPs. Then the two encoding metal ions (Cd(2+) and Pb(2+)) were used to differentiate two viruses DNA due to the different subsequent anodic stripping voltammetric peaks at -0.84 V (Cd) and -0.61 V (Pb). Because of the excellent signal amplification of the polymer nanotracers and the great specificity of DNA targets, this assay could detect targets DNA as low as 0.2 femtomolar and exhibited excellent selectivity with the dynamitic range from 0.5 fM to 500 pM. Those results demonstrated that this electrochemical coding assay has great potential in applications for screening more viruses DNA while changing the probes.

  8. DNA.

    ERIC Educational Resources Information Center

    Felsenfeld, Gary

    1985-01-01

    Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)

  9. Homological stabilizer codes

    SciTech Connect

    Anderson, Jonas T.

    2013-03-15

    In this paper we define homological stabilizer codes on qubits which encompass codes such as Kitaev's toric code and the topological color codes. These codes are defined solely by the graphs they reside on. This feature allows us to use properties of topological graph theory to determine the graphs which are suitable as homological stabilizer codes. We then show that all toric codes are equivalent to homological stabilizer codes on 4-valent graphs. We show that the topological color codes and toric codes correspond to two distinct classes of graphs. We define the notion of label set equivalencies and show that under a small set of constraints the only homological stabilizer codes without local logical operators are equivalent to Kitaev's toric code or to the topological color codes. - Highlights: Black-Right-Pointing-Pointer We show that Kitaev's toric codes are equivalent to homological stabilizer codes on 4-valent graphs. Black-Right-Pointing-Pointer We show that toric codes and color codes correspond to homological stabilizer codes on distinct graphs. Black-Right-Pointing-Pointer We find and classify all 2D homological stabilizer codes. Black-Right-Pointing-Pointer We find optimal codes among the homological stabilizer codes.

  10. Stack filter classifiers

    SciTech Connect

    Porter, Reid B; Hush, Don

    2009-01-01

    Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

  11. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea.

    PubMed

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  12. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  13. Undetectable levels of N6-methyl adenine in mouse DNA: Cloning and analysis of PRED28, a gene coding for a putative mammalian DNA adenine methyltransferase.

    PubMed

    Ratel, David; Ravanat, Jean-Luc; Charles, Marie-Pierre; Platet, Nadine; Breuillaud, Lionel; Lunardi, Joël; Berger, François; Wion, Didier

    2006-05-29

    Three methylated bases, 5-methylcytosine, N4-methylcytosine and N6-methyladenine (m6A), can be found in DNA. However, to date, only 5-methylcytosine has been detected in mammalian genomes. To reinvestigate the presence of m6A in mammalian DNA, we used a highly sensitive method capable of detecting one N6-methyldeoxyadenosine per million nucleosides. Our results suggest that the total mouse genome contains, if any, less than 10(3) m6A. Experiments were next performed on PRED28, a putative mammalian N6-DNA methyltransferase. The murine PRED28 encodes two alternatively spliced RNA. However, although recombinant PRED28 proteins are found in the nucleus, no evidence for an adenine-methyltransferase activity was detected. PMID:16684535

  14. Highly sensitive and selective microRNA detection based on DNA-bio-bar-code and enzyme-assisted strand cycle exponential signal amplification.

    PubMed

    Dong, Haifeng; Meng, Xiangdan; Dai, Wenhao; Cao, Yu; Lu, Huiting; Zhou, Shufeng; Zhang, Xueji

    2015-04-21

    Herein, a highly sensitive and selective microRNA (miRNA) detection strategy using DNA-bio-bar-code amplification (BCA) and Nb·BbvCI nicking enzyme-assisted strand cycle for exponential signal amplification was designed. The DNA-BCA system contains a locked nucleic acid (LNA) modified DNA probe for improving hybridization efficiency, while a signal reported molecular beacon (MB) with an endonuclease recognition site was designed for strand cycle amplification. In the presence of target miRNA, the oligonucleotides functionalized magnetic nanoprobe (MNP-DNA) and gold nanoprobe (AuNP-DNA) with numerous reported probes (RP) can hybridize with target miRNA, respectively, to form a sandwich structure. After sandwich structures were separated from the solution by the magnetic field, the RP were released under high temperature to recognize the MB and cleaved the hairpin DNA to induce the dissociation of RP. The dissociated RP then triggered the next strand cycle to produce exponential fluorescent signal amplification for miRNA detection. Under optimized conditions, the exponential signal amplification system shows a good linear range of 6 orders of magnitude (from 0.3 pM to 3 aM) with limit of detection (LOD) down to 52.5 zM, while the sandwich structure renders the system with high selectivity. Meanwhile, the feasibility of the proposed strategy for cell miRNA detection was confirmed by analyzing miRNA-21 in HeLa lysates. Given the high-performance for miRNA analysis, the strategy has a promising application in biological detection and in clinical diagnosis.

  15. DNA

    ERIC Educational Resources Information Center

    Stent, Gunther S.

    1970-01-01

    This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)

  16. Molecular cloning of a cDNA coding for mouse liver xanthine dehydrogenase. Regulation of its transcript by interferons in vivo.

    PubMed Central

    Terao, M; Cazzaniga, G; Ghezzi, P; Bianchi, M; Falciani, F; Perani, P; Garattini, E

    1992-01-01

    The cDNA coding for xanthine dehydrogenase (XD) is isolated from mouse liver mRNA by cross-hybridization with a DNA fragment of the Drosophila melanogaster homologue. Two lambda bacteriophage overlapping clones represent the copy of a 4538-nucleotide-residue-long transcript with an open reading frame of 4005 nucleotide residues, coding for a putative polypeptide of 1335 amino acid residues. Comparison of the deduced amino acid sequence of the mouse XD with those of the Drosophila and the rat homologues shows a high conservation of this protein (55% identity between mouse and Drosophila, and 94% identity between mouse and rat). RNA blotting analysis demonstrates that interferon-alpha (IFN-alpha) and its inducers, i.e. poly(I).poly(C), bacterial lipopolysaccharide (LPS) and tilorone (2,7-bis-[2-(diethylamino)ethoxy]fluoren-9-one), increase the expression of XD mRNA in liver. Poly(I).poly(C) also induces XD mRNA in several other tissues in vivo. Protein synthesis de novo is not required for the elevation of XD mRNA after IFN-alpha treatment, since cycloheximide does not block the induction. The elevation of XD mRNA concentration is relatively fast and precedes the induction of both XD and xanthine oxidase (XO) enzymic activities. Images Fig. 4. Fig. 5. Fig. 6. Fig. 7. PMID:1590774

  17. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) - Definition of a Distinct Class of Begomovirus-Associated Satellites.

    PubMed

    Lozano, Gloria; Trenado, Helena P; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem-loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem-loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  18. Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

    PubMed Central

    Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

    2016-01-01

    Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037

  19. Nucleotide sequence of cDNA coding for dianthin 30, a ribosome inactivating protein from Dianthus caryophyllus.

    PubMed

    Legname, G; Bellosta, P; Gromo, G; Modena, D; Keen, J N; Roberts, L M; Lord, J M

    1991-08-27

    Rabbit antibodies raised against dianthin 30, a ribosome inactivating protein from carnation (Dianthus caryophyllus) leaves, were used to identify a full length dianthin precursor cDNA clone from a lambda gt11 expression library. N-terminal amino acid sequencing of purified dianthin 30 and dianthin 32 confirmed that the clone encoded dianthin 30. The cDNA was 1153 basepairs in length and encoded a precursor protein of 293 amino acid residues. The first 23 N-terminal amino acids of the precursor represented the signal sequence. The protein contained a carboxy-terminal region which, by analogy with barley lectin, may contain a vacuolar targeting signal.

  20. The phage T4-coded DNA replication helicase (gp41) forms a hexamer upon activation by nucleoside triphosphate.

    PubMed

    Dong, F; Gogol, E P; von Hippel, P H

    1995-03-31

    Sedimentation and high performance liquid chromatography studies show that the functional DNA replication helicase of bacteriophage T4 (gp41) exists primarily as a dimer at physiological protein concentrations, assembling from gp41 monomers with an association constant of approximately 10(6) M-1. Cryoelectron microscopy, analytical ultracentrifugation, and protein-protein cross-linking studies demonstrate that the binding of ATP or GTP drives the assembly of these dimers into monodisperse hexameric complexes, which redissociate following depletion of the purine nucleotide triphosphatase (PuTP) substrates by the DNA-stimulated PuTPase activity of the helicase. The hexameric state of gp41 can be stabilized for detailed study by the addition of the nonhydrolyzable PuTP analogs ATP gamma S and GTP gamma S and is not significantly affected by the presence of ADP, GDP, or single-stranded or forked DNA template constructs, although some structural details of the hexameric complex may be altered by DNA binding. Our results also indicate that the active gp41 helicase exists as a hexagonal trimer of asymmetric dimers, and that the hexamer is probably characterized by D3 symmetry. The assembly pathway of the gp41 helicase has been analyzed, and its structure and properties compared with those of other helicases involved in a variety of cellular processes. Functional implications of such structural organization are also considered. PMID:7706292

  1. Ubiquitous and gene-specific regulatory 5' sequences in a sea urchin histone DNA clone coding for histone protein variants.

    PubMed Central

    Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L

    1980-01-01

    The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547

  2. Emergent behaviors of classifier systems

    SciTech Connect

    Forrest, S.; Miller, J.H.

    1989-01-01

    This paper discusses some examples of emergent behavior in classifier systems, describes some recently developed methods for studying them based on dynamical systems theory, and presents some initial results produced by the methodology. The goal of this work is to find techniques for noticing when interesting emergent behaviors of classifier systems emerge, to study how such behaviors might emerge over time, and make suggestions for designing classifier systems that exhibit preferred behaviors. 20 refs., 1 fig.

  3. Arabidopsis RNASE THREE LIKE2 Modulates the Expression of Protein-Coding Genes via 24-Nucleotide Small Interfering RNA-Directed DNA Methylation[OPEN

    PubMed Central

    Hachet, Mélanie; Comella, Pascale; Zytnicki, Matthias; Vaucheret, Hervé

    2016-01-01

    RNaseIII enzymes catalyze the cleavage of double-stranded RNA (dsRNA) and have diverse functions in RNA maturation. Arabidopsis thaliana RNASE THREE LIKE2 (RTL2), which carries one RNaseIII and two dsRNA binding (DRB) domains, is a unique Arabidopsis RNaseIII enzyme resembling the budding yeast small interfering RNA (siRNA)-producing Dcr1 enzyme. Here, we show that RTL2 modulates the production of a subset of small RNAs and that this activity depends on both its RNaseIII and DRB domains. However, the mode of action of RTL2 differs from that of Dcr1. Whereas Dcr1 directly cleaves dsRNAs into 23-nucleotide siRNAs, RTL2 likely cleaves dsRNAs into longer molecules, which are subsequently processed into small RNAs by the DICER-LIKE enzymes. Depending on the dsRNA considered, RTL2-mediated maturation either improves (RTL2-dependent loci) or reduces (RTL2-sensitive loci) the production of small RNAs. Because the vast majority of RTL2-regulated loci correspond to transposons and intergenic regions producing 24-nucleotide siRNAs that guide DNA methylation, RTL2 depletion modifies DNA methylation in these regions. Nevertheless, 13% of RTL2-regulated loci correspond to protein-coding genes. We show that changes in 24-nucleotide siRNA levels also affect DNA methylation levels at such loci and inversely correlate with mRNA steady state levels, thus implicating RTL2 in the regulation of protein-coding gene expression. PMID:26764378

  4. Arabidopsis RNASE THREE LIKE2 Modulates the Expression of Protein-Coding Genes via 24-Nucleotide Small Interfering RNA-Directed DNA Methylation.

    PubMed

    Elvira-Matelot, Emilie; Hachet, Mélanie; Shamandi, Nahid; Comella, Pascale; Sáez-Vásquez, Julio; Zytnicki, Matthias; Vaucheret, Hervé

    2016-02-01

    RNaseIII enzymes catalyze the cleavage of double-stranded RNA (dsRNA) and have diverse functions in RNA maturation. Arabidopsis thaliana RNASE THREE LIKE2 (RTL2), which carries one RNaseIII and two dsRNA binding (DRB) domains, is a unique Arabidopsis RNaseIII enzyme resembling the budding yeast small interfering RNA (siRNA)-producing Dcr1 enzyme. Here, we show that RTL2 modulates the production of a subset of small RNAs and that this activity depends on both its RNaseIII and DRB domains. However, the mode of action of RTL2 differs from that of Dcr1. Whereas Dcr1 directly cleaves dsRNAs into 23-nucleotide siRNAs, RTL2 likely cleaves dsRNAs into longer molecules, which are subsequently processed into small RNAs by the DICER-LIKE enzymes. Depending on the dsRNA considered, RTL2-mediated maturation either improves (RTL2-dependent loci) or reduces (RTL2-sensitive loci) the production of small RNAs. Because the vast majority of RTL2-regulated loci correspond to transposons and intergenic regions producing 24-nucleotide siRNAs that guide DNA methylation, RTL2 depletion modifies DNA methylation in these regions. Nevertheless, 13% of RTL2-regulated loci correspond to protein-coding genes. We show that changes in 24-nucleotide siRNA levels also affect DNA methylation levels at such loci and inversely correlate with mRNA steady state levels, thus implicating RTL2 in the regulation of protein-coding gene expression. PMID:26764378

  5. Chloroplast genome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium: insights into cpDNA evolution and phylogeny of extant seed plants.

    PubMed

    Wu, Chung-Shien; Wang, Ya-Nan; Liu, Shu-Mei; Chaw, Shu-Miaw

    2007-06-01

    Phylogenetic relationships among the 5 groups of extant seed plants are presently unsettled. To reexamine this long-standing debate, we determine the complete chloroplast genome (cpDNA) of Cycas taitungensis and 56 protein-coding genes encoded in the cpDNA of Gnetum parvifolium. The cpDNA of Cycas is a circular molecule of 163,403 bp with 2 typical large inverted repeats (IRs) of 25,074 bp each. We inferred phylogenetic relationships among major seed plant lineages using concatenated 56 protein-coding genes in 37 land plants. Phylogenies, generated by the use of 3 independent methods, provide concordant and robust support for the monophylies of extant seed plants, gymnosperms, and angiosperms. Within the modern gymnosperms are 2 highly supported sister clades: Cycas-Ginkgo and Gnetum-Pinus. This result agrees with both the "gnetifer" and "gnepines" hypotheses. The sister relationships in Cycas-Ginkgo and Gnetum-Pinus clades are further reinforced by cpDNA structural evidence. Branch lengths of Cycas-Ginkgo and Gnetum were consistently the shortest and the longest, respectively, in all separate analyses. However, the Gnetum relative rate test revealed this tendency only for the 3rd codon positions and the transversional sites of the first 2 codon positions. A PsitufA located between psbE and petL genes is here first detected in Anthoceros (a hornwort), cycads, and Ginkgo. We demonstrate that the PsitufA is a footprint descended from the chloroplast tufA of green algae. The duplication of ycf2 genes and their shift into IRs should have taken place at least in the common ancestor of seed plants more than 300 MYA, and the tRNAPro-GGG gene was lost from the angiosperm lineage at least 150 MYA. Additionally, from cpDNA structural comparison, we propose an alternative model for the loss of large IR regions in black pine. More cpDNA data from non-Pinaceae conifers are necessary to justify whether the gnetifer or gnepines hypothesis is valid and to generate solid structural

  6. Replication of a pathogenic non-coding RNA increases DNA methylation in plants associated with a bromodomain-containing viroid-binding protein

    PubMed Central

    Lv, Dian-Qiu; Liu, Shang-Wu; Zhao, Jian-Hua; Zhou, Bang-Jun; Wang, Shao-Peng; Guo, Hui-Shan; Fang, Yuan-Yuan

    2016-01-01

    Viroids are plant-pathogenic molecules made up of single-stranded circular non-coding RNAs. How replicating viroids interfere with host silencing remains largely unknown. In this study, we investigated the effects of a nuclear-replicating Potato spindle tuber viroid (PSTVd) on interference with plant RNA silencing. Using transient induction of silencing in GFP transgenic Nicotiana benthamiana plants (line 16c), we found that PSTVd replication accelerated GFP silencing and increased Virp1 mRNA, which encodes bromodomain-containing viroid-binding protein 1 and is required for PSTVd replication. DNA methylation was increased in the GFP transgene promoter of PSTVd-replicating plants, indicating involvement of transcriptional gene silencing. Consistently, accelerated GFP silencing and increased DNA methylation in the of GFP transgene promoter were detected in plants transiently expressing Virp1. Virp1 mRNA was also increased upon PSTVd infection in natural host potato plants. Reduced transcript levels of certain endogenous genes were also consistent with increases in DNA methylation in related gene promoters in PSTVd-infected potato plants. Together, our data demonstrate that PSTVd replication interferes with the nuclear silencing pathway in that host plant, and this is at least partially attributable to Virp1. This study provides new insights into the plant-viroid interaction on viroid pathogenicity by subverting the plant cell silencing machinery. PMID:27767195

  7. Cloning and sequence analysis of cDNA coding for a lectin from Helianthus tuberosus callus and its jasmonate-induced expression.

    PubMed

    Nakagawa, R; Yasokawa, D; Okumura, Y; Nagashima, K

    2000-06-01

    Two lectins (designated as HTA I and HTA II) that seemed to be isolectins were found in Helianthus tuberosus callus. cDNA encoding HTA I was isolated from a ZAP Express expression library by immunoselection by using the anti-HTA antiserum. The sequence of this cDNA consisted of 432 bp nucleotides coding for a polypeptide of 143 amino acid residues (Mr, 15,314). When introduced into E. coli, the cDNA directed the synthesis of active HTA I as indicated by the hemagglutination activity. The deduced amino acid sequence showed homology with some lectins and jasmonate-induced proteins. When callus was cultured in the presence of methyl jasmonate (MeJA), the hemagglutination activity increased in a dose-dependent manner. The levels of expression of the HTA protein and of the corresponding mRNA also increased in the treated callus. In view of these results, HTA I is considered to be a jasmonate-induced protein. PMID:10923797

  8. Genome-wide DNA methylome analysis reveals epigenetically dysregulated non-coding RNAs in human breast cancer

    PubMed Central

    Li, Yongsheng; Zhang, Yunpeng; Li, Shengli; Lu, Jianping; Chen, Juan; Wang, Yuan; Li, Yixue; Xu, Juan; Li, Xia

    2015-01-01

    Despite growing appreciation of the importance of epigenetics in breast cancer, our understanding of epigenetic alterations of non-coding RNAs (ncRNAs) in breast cancer remains limited. Here, we explored the epigenetic patterns of ncRNAs in breast cancers using published sequencing-based methylome data, primarily focusing on the two most commonly studied ncRNA biotypes, long ncRNAs and miRNAs. We observed widely aberrant methylation in the promoters of ncRNAs, and this abnormal methylation was more frequent than that in protein-coding genes. Specifically, intergenic ncRNAs were observed to comprise a majority (51.45% of the lncRNAs and 51.57% of the miRNAs) of the aberrantly methylated ncRNA promoters. Moreover, we summarized five patterns of aberrant ncRNA promoter methylation in the context of genomic CpG islands (CGIs), in which aberrant methylation occurred not only on CGIs, but also in regions flanking CGI and in CGI-lacking promoters. Integration with transcriptional datasets enabled us to determine that the ncRNA promoter methylation events were associated with transcriptional changes. Furthermore, a panel of ncRNAs were identified as biomarkers that discriminated between disease phenotypes. Finally, the potential functions of aberrantly methylated ncRNAs were predicted, suggestiong that ncRNAs and coding genes cooperatively mediate pathway dysregulation during the development and progression of breast cancer. PMID:25739977

  9. An atpE-specific promoter within the coding region of the atpB gene in tobacco chloroplast DNA.

    PubMed

    Kapoor, S; Wakasugi, T; Deno, H; Sugiura, M

    1994-09-01

    The atpB and atpE genes encode beta and epsilon subunits, respectively, of chloroplast ATP synthase and are co-transcribed in the plant species so far studied. In tobacco, an atpB gene-specific probe hybridizes to 2.7- and 2.3-kb transcripts. In addition to these, a probe from the atpE coding region hybridizes also to a 1.0-kb transcript. The 5' end of the atpE-specific transcript has been mapped 430/431 nt upstream of the atpE translation initiation site, within the coding region of the atpB gene. In-vitro capping revealed that this transcript results from a primary transcriptional event and is also characterized by -10 and -35 canonical sequences in the 5' region. It has been found to share a common 3' end with the bi-cistronic transcripts that has been mapped within the coding region of the divergently transcribed trnM gene, approximately 236 nt downstream from the atpE termination codon. Interestingly, this transcript accumulates only in leaves and not in proplastid-containing cultured (BY-2) cells, indicating that, unless it is preferentially degraded in BY-2 cells, its expression might be transcriptionally controlled.

  10. Two hybrid plasmids with D. melanogaster DNA sequences complementary to mRNA coding for the major heat shock protein.

    PubMed

    Schedl, P; Artavanis-Tsakonas, S; Steward, R; Gehring, W J; Mirault, M E; Goldschmidt-Clermont, M; Moran, L; Tissières, A

    1978-08-01

    The isolation and partial characterization of two cloned segments of Drosophila melanogaster DNA containing "heat shock" gene sequences is described. We have inserted sheared embryonic D. melanogaster DNA by the poly(dA-dt) connector method (Lobban and Kaiser, 1973) into the R1 restriction site of the ampicillin-resistant plasmid pSF2124 (So, Gill and Falkow, 1975). A collection of independent hybrid plasmids was screened by colony hybridization (Grunstein and Hogness, 1975) for sequences complementary to in vitro labeled polysomal poly(A)+ heat shock RNA. Two clones were identified which contain sequences complementary to a heat shock mRNA species that directs the in vitro synthesis of the 70,000 dalton heat-induced polypeptide. Both cloned segments hybridize in situ to the heat-induced puff sites located at 87A and 87C of the salivary gland polytene chromosomes. PMID:99246

  11. Restriction maps of the regions coding for methicillin and tobramycin resistances on chromosomal DNA in methicillin-resistant staphylococci.

    PubMed Central

    Ubukata, K; Nonoguchi, R; Matsuhashi, M; Song, M D; Konno, M

    1989-01-01

    Chromosomal BamHI DNA fragments containing both the mecA gene encoding the penicillin-binding protein responsible for methicillin resistance and the aadD gene encoding 4',4"-adenylyltransferase responsible for tobramycin resistance were cloned from three methicillin- and tobramycin-resistant strains of Staphylococcus aureus and one strain of Staphylococcus epidermidis. Physical maps of the fragments were similar, suggesting their unique origin. Images PMID:2817861

  12. Feature Selection and Effective Classifiers.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  13. Phylogenetic analysis of Pythium insidiosum Thai strains using cytochrome oxidase II (COX II) DNA coding sequences and internal transcribed spacer regions (ITS).

    PubMed

    Kammarnjesadakul, Patcharee; Palaga, Tanapat; Sritunyalucksana, Kallaya; Mendoza, Leonel; Krajaejun, Theerapong; Vanittanakom, Nongnuch; Tongchusak, Songsak; Denduangboripant, Jessada; Chindamporn, Ariya

    2011-04-01

    To investigate the phylogenetic relationship among Pythium insidiosum isolates in Thailand, we investigated the genomic DNA of 31 P. insidiosum strains isolated from humans and environmental sources from Thailand, and two from North and Central America. We used PCR to amplify the partial COX II DNA coding sequences and the ITS regions of these isolates. The nucleotide sequences of both amplicons were analyzed by the Bioedit program. Phylogenetic analysis using genetic distance method with Neighbor Joining (NJ) approach was performed using the MEGA4 software. Additional sequences of three other Pythium species, Phytophthora sojae and Lagenidium giganteum were employed as outgroups. The sizes of the COX II amplicons varied from 558-564 bp, whereas the ITS products varied from approximately 871-898 bp. Corrected sequence divergences with Kimura 2-parameter model calculated for the COX II and the ITS DNA sequences ranged between 0.0000-0.0608 and 0.0000-0.2832, respectively. Phylogenetic analysis using both the COX II and the ITS DNA sequences showed similar trees, where we found three sister groups (A(TH), B(TH), and C(TH)) among P. insidiosum strains. All Thai isolates from clinical cases and environmental sources were placed in two separated sister groups (B(TH) and C(TH)), whereas the Americas isolates were grouped into A(TH.) Although the phylogenetic tree based on both regions showed similar distribution, the COX II phylogenetic tree showed higher resolution than the one using the ITS sequences. Our study indicates that COX II gene is the better of the two alternatives to study the phylogenetic relationships among P. insidiosum strains. PMID:20818919

  14. The evolution of the coding exome of the Arabidopsis species - the influences of DNA methylation, relative exon position, and exon length

    PubMed Central

    2014-01-01

    Background The evolution of the coding exome is a major driving force of functional divergence both between species and between protein isoforms. Exons at different positions in the transcript or in different transcript isoforms may (1) mutate at different rates due to variations in DNA methylation level; and (2) serve distinct biological roles, and thus be differentially targeted by natural selection. Furthermore, intrinsic exonic features, such as exon length, may also affect the evolution of individual exons. Importantly, the evolutionary effects of these intrinsic/extrinsic features may differ significantly between animals and plants. Such inter-lineage differences, however, have not been systematically examined. Results Here we examine how DNA methylation at CpG dinucleotides (CpG methylation), in the context of intrinsic exonic features (exon length and relative exon position in the transcript), influences the evolution of coding exons of Arabidopsis thaliana. We observed fairly different evolutionary patterns in A. thaliana as compared with those reported for animals. Firstly, the mutagenic effect of CpG methylation is the strongest for internal exons and the weakest for first exons despite the stringent selective constraints on the former group. Secondly, the mutagenic effect of CpG methylation increases significantly with length in first exons but not in the other two exon groups. Thirdly, CpG methylation level is correlated with evolutionary rates (dS, dN, and the dN/dS ratio) with markedly different patterns among the three exon groups. The correlations are generally positive, negative, and mixed for first, last, and internal exons, respectively. Fourthly, exon length is a CpG methylation-independent indicator of evolutionary rates, particularly for dN and the dN/dS ratio in last and internal exons. Finally, the evolutionary patterns of coding exons with regard to CpG methylation differ significantly between Arabidopsis species and mammals. Conclusions

  15. Sequence of a novel cytochrome CYP2B cDNA coding for a protein which is expressed in a sebaceous gland, but not in the liver.

    PubMed Central

    Friedberg, T; Grassow, M A; Bartlomowicz-Oesch, B; Siegert, P; Arand, M; Adesnik, M; Oesch, F

    1992-01-01

    The major phenobarbital-inducible rat hepatic cytochromes P-450, CYP2B1 and CYP2B2, are the paradigmatic members of a cytochrome P-450 gene subfamily that contains at least seven additional members. Specific oligonucleotide probes for these genomic members of the CYP2B subfamily were used to assess their tissue-specific expression. In Northern-blot analysis a probe specific to gene 4 (which is designated now as CYP2B12) hybridized to a single mRNA present in the preputial gland, an organ which is used as a model for sebaceous glands, but did not hybridize to mRNA isolated from the liver or from five other tissues of untreated or Aroclor 1254-treated rats. The cDNA sequence for the CYP2B12 RNA was determined from overlapping cDNA clones and contained a long open reading frame of 1476 bp. The nucleotide sequence of the CYP2B12 cDNA was 85% similar to the sequence of the CYP2B1 cDNA in its coding region and was different from any CYP2B cDNA characterized until now. The cDNA-derived primary structure of the CYP2B12 protein contains a signal sequence for its insertion into the endoplasmic reticulum and the putative haem-binding site characteristic of cytochromes P-450. A part of the potential haem pocket of CYP2B12 was identical with a similar structure in a bacterial protocatechuate dioxygenase. In immunoblot analysis of preputial-gland microsomes, antibodies against CYP2B1 recognized a single abundant protein with a lower apparent molecular mass than that of CYP2B1. Our results demonstrate that the CYP2B12 protein has the potential to be enzymically active and are the first demonstration that a member of the CYP2B subfamily is expressed exclusively and at high levels in an extrahepatic organ. Images Fig. 1. Fig. 5. Fig. 6. PMID:1445240

  16. Color bar coding the BRCA1 gene on combed DNA: a useful strategy for detecting large gene rearrangements.

    PubMed

    Gad, S; Aurias, A; Puget, N; Mairal, A; Schurra, C; Montagna, M; Pages, S; Caux, V; Mazoyer, S; Bensimon, A; Stoppa-Lyonnet, D

    2001-05-01

    Genetic linkage data have shown that alterations of the BRCA1 gene are responsible for the majority of hereditary breast and ovarian cancers. BRCA1 germline mutations, however, are found less frequently than expected. Mutation detection strategies, which are generally based on the polymerase chain reaction, therefore focus on point and small gene alterations. These approaches do not allow for the detection of large gene rearrangements, which also can be involved in BRCA1 alterations. Indeed, a few of them, spread over the entire BRCA1 gene, have been detected recently by Southern blotting or transcript analysis. We have developed an alternative strategy allowing a panoramic view of the BRCA1 gene, based on dynamic molecular combing and the design of a full four-color bar code of the BRCA1 region. The strategy was tested with the study of four large BRCA1 rearrangements previously reported. In addition, when screening a series of 10 breast and ovarian cancer families negatively tested for point mutation in BRCA1/2, we found an unreported 17-kb BRCA1 duplication encompassing exons 3 to 8. The detection of rearrangements as small as 2 to 6 kb with respect to the normal size of the studied fragment is achieved when the BRCA1 region is divided into 10 fragments. In addition, as the BRCA1 bar code is a morphologic approach, the direct observation of complex and likely underreported rearrangements, such as inversions and insertions, becomes possible. PMID:11284038

  17. PCR assay based on DNA coding for 16S rRNA for detection and identification of mycobacteria in clinical samples.

    PubMed Central

    Kox, L F; van Leeuwen, J; Knijper, S; Jansen, H M; Kolk, A H

    1995-01-01

    A PCR and a reverse cross blot hybridization assay were developed for the detection and identification of mycobacteria in clinical samples. The PCR amplifies a part of the DNA coding for 16S rRNA with a set of primers that is specific for the genus Mycobacterium and that flanks species-specific sequences within the genes coding for 16S rRNA. The PCR product is analyzed in a reverse cross blot hybridization assay with probes specific for M. tuberculosis complex (pTub1), M. avium (pAvi3), M. intracellulare (pInt5 and pInt7), M. kansasii complex-M. scrofulaceum complex (pKan1), M. xenopi (pXen1), M. fortuitum (pFor1), M. smegmatis (pSme1), and Mycobacterium spp. (pMyc5a). The PCR assay can detect 10 fg of DNA, the equivalent of two mycobacteria. The specificities of the probes were tested with 108 mycobacterial strains (33 species) and 31 nonmycobacterial strains (of 17 genera). The probes pAvi3, pInt5, pInt7, pKan1, pXen1, and pMyc5a were specific. With probes pTub1, pFor1, and pSme1, slight cross hybridization occurred. However, the mycobacterial strains from which the cross-hybridizing PCR products were derived belonged to nonpathogenic or nonopportunistic species which do not occur in clinical samples. The test was used on 31 different clinical specimens obtained from patients suspected of having mycobacterial disease, including a patient with a double mycobacterial infection. The samples included sputum, bronchoalveolar lavage, tissue biopsy samples, cerebrospinal fluid, pus, peritoneal fluid, pleural fluid, and blood. The results of the PCR assay agreed with those of conventional identification methods or with clinical data, showing that the test can be used for the direct and rapid detection and identification of mycobacteria in clinical samples. PMID:8586707

  18. Stalled RNAP-II molecules bound to non-coding rDNA spacers are required for normal nucleolus architecture.

    PubMed

    Freire-Picos, M A; Landeira-Ameijeiras, V; Mayán, María D

    2013-07-01

    The correct distribution of nuclear domains is critical for the maintenance of normal cellular processes such as transcription and replication, which are regulated depending on their location and surroundings. The most well-characterized nuclear domain, the nucleolus, is essential for cell survival and metabolism. Alterations in nucleolar structure affect nuclear dynamics; however, how the nucleolus and the rest of the nuclear domains are interconnected is largely unknown. In this report, we demonstrate that RNAP-II is vital for the maintenance of the typical crescent-shaped structure of the nucleolar rDNA repeats and rRNA transcription. When stalled RNAP-II molecules are not bound to the chromatin, the nucleolus loses its typical crescent-shaped structure. However, the RNAP-II interaction with Seh1p, or cryptic transcription by RNAP-II, is not critical for morphological changes.

  19. Cloning of the cDNA (DSC1) coding for human type 1 desmocollin and its assignment to chromosome 18

    SciTech Connect

    King, I.A.; Buxton, R.S. ); Spurr, N.K.; Arnemann, J. )

    1993-11-01

    Desmosomes are adhesive epithelial junctions that contain two distinct classes of cadherin-related glycoproteins (desmogleins and desmocollins), both of which occur as several different isoforms whose expression is related to epithelial differentiation. The authors have now isolated cDNA clones encoding a human desmocollin that is expressed in the more differentiated layers of human epidermis. The isoform has 53% amino acid identity with the previously isolated human (type 3) desmocollin, which is expressed in the basal layers of the epidermis. However, the N- and C-termini of the mature proteins are more highly conserved. Using a panel of somatic cell hybrids, human type 1 desmocollin (gene DSC1) has been assigned to chromosome 18, the same location as the other desmocollin gene (DSC3) and the three desmoglein (DSG) genes already mapped. 49 refs., 5 figs., 1 tab.

  20. Cloning and Molecular Characterization of a cDNA Clone Coding for Trichomonas vaginalis Alpha-Actinin and Intracellular Localization of the Protein

    PubMed Central

    Addis, Maria Filippa; Rappelli, Paola; Delogu, Giuseppe; Carta, Franco; Cappuccinelli, Piero; Fiori, Pier Luigi

    1998-01-01

    We have identified and sequenced a cDNA clone coding for Trichomonas vaginalis alpha-actinin. Analysis of the obtained sequence revealed that the 2,857-nucleotide-long cDNA contained an open reading frame encoding 849 amino acids which showed consistent homology with alpha-actinins of different species. Such homology was particularly significant in regions which have been reported to represent the actin-binding and Ca2+-binding domains in other alpha-actinins. The deduced protein was also characterized by the presence of a divergent central region thought to play a role in its high immunogenicity. A study of protein localization performed by immunofluorescence revealed that the protein is diffusely distributed throughout the T. vaginalis cytoplasm when the cell is pear shaped. When parasites adhere and transform into the amoeboid morphology, the protein is located only in areas close to the cytoplasmic membrane and colocalizes with actin. Concomitantly with transformation into the amoeboid morphology, alpha-actinin mRNA expression is upregulated. PMID:9746598

  1. Visualizing the proteome of Escherichia coli: an efficient and versatile method for labeling chromosomal coding DNA sequences (CDSs) with fluorescent protein genes

    PubMed Central

    Watt, Rory M.; Wang, Jing; Leong, Meikid; Kung, Hsiang-fu; Cheah, Kathryn S.E.; Liu, Depei; Huang, Jian-Dong

    2007-01-01

    To investigate the feasibility of conducting a genomic-scale protein labeling and localization study in Escherichia coli, a representative subset of 23 coding DNA sequences (CDSs) was selected for chromosomal tagging with one or more fluorescent protein genes (EGFP, EYFP, mRFP1, DsRed2). We used λ-Red recombination to precisely and efficiently position PCR-generated DNA targeting cassettes containing a fluorescent protein gene and an antibiotic resistance marker, at the C-termini of the CDSs of interest, creating in-frame fusions under the control of their native promoters. We incorporated cre/loxP and flpe/frt technology to enable multiple rounds of chromosomal tagging events to be performed sequentially with minimal disruption to the target locus, thus allowing sets of proteins to be co-localized within the cell. The visualization of labeled proteins in live E. coli cells using fluorescence microscopy revealed a striking variety of distributions including: membrane and nucleoid association, polar foci and diffuse cytoplasmic localization. Fifty of the fifty-two independent targeting experiments performed were successful, and 21 of the 23 selected CDSs could be fluorescently visualized. Our results show that E. coli has an organized and dynamic proteome, and demonstrate that this approach is applicable for tagging and (co-) localizing CDSs on a genome-wide scale. PMID:17272300

  2. The Challenge of Classifying Polyhedra.

    ERIC Educational Resources Information Center

    Pedersen, Jean J.

    1980-01-01

    A question posed by Euler is considered: How can polyhedra be classified so that the results is in some way analogous to the simple classification of polygons according to the number of their sides? (MK)

  3. IAEA safeguards and classified materials

    SciTech Connect

    Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

    1997-11-01

    The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA`s safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials.

  4. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence

    PubMed Central

    Neme, Rafik; Tautz, Diethard

    2016-01-01

    Deep sequencing analyses have shown that a large fraction of genomes is transcribed, but the significance of this transcription is much debated. Here, we characterize the phylogenetic turnover of poly-adenylated transcripts in a comprehensive sampling of taxa of the mouse (genus Mus), spanning a phylogenetic distance of 10 Myr. Using deep RNA sequencing we find that at a given sequencing depth transcriptome coverage becomes saturated within a taxon, but keeps extending when compared between taxa, even at this very shallow phylogenetic level. Our data show a high turnover of transcriptional states between taxa and that no major transcript-free islands exist across evolutionary time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. We conclude that any part of the non-coding genome can potentially become subject to evolutionary functionalization via de novo gene evolution within relatively short evolutionary time spans. DOI: http://dx.doi.org/10.7554/eLife.09977.001 PMID:26836309

  5. Building classifiers using Bayesian networks

    SciTech Connect

    Friedman, N.; Goldszmidt, M.

    1996-12-31

    Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state of the art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we examine and evaluate approaches for inducing classifiers from data, based on recent results in the theory of learning Bayesian networks. Bayesian networks are factored representations of probability distributions that generalize the naive Bayes classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness which are characteristic of naive Bayes. We experimentally tested these approaches using benchmark problems from the U. C. Irvine repository, and compared them against C4.5, naive Bayes, and wrapper-based feature selection methods.

  6. DNA Dynamics.

    ERIC Educational Resources Information Center

    Warren, Michael D.

    1997-01-01

    Explains a method to enable students to understand DNA and protein synthesis using model-building and role-playing. Acquaints students with the triplet code and transcription. Includes copies of the charts used in this technique. (DDR)

  7. Isolation and nucleotide sequence of mouse NCAM cDNA that codes for a Mr 79,000 polypeptide without a membrane-spanning region.

    PubMed Central

    Barthels, D; Santoni, M J; Wille, W; Ruppert, C; Chaix, J C; Hirsch, M R; Fontecilla-Camps, J C; Goridis, C

    1987-01-01

    The neural cell adhesion molecule (NCAM) exists in several isoforms which are selectively expressed by different cell types and at different stages of development. In the mouse, three proteins with apparent Mr's of 180,000, 140,000 and 120,000 have been distinguished that are encoded by 4-5 different mRNAs. Here we report the full amino acid sequence of a NCAM protein inferred from the sequences of overlapping cDNA clones. The 706-residue polypeptide contains, towards its N-terminus, 5 domains that share structural homology with members of the immunoglobulin supergene family. The sequence does not encode a typical membrane-spanning segment, but ends with 24 uncharged amino acids followed by two stop codons. This fact, together with size considerations, make it highly likely that our sequence represents NCAM-120, which lacks transmembrane or cytoplasmic domains and is attached to the membrane by phospholipid. Probes from the 5' region detect all four NCAM gene transcripts present in mouse brain consistent with the notion that the extracellular domains are common to most NCAM forms. However, a 3' probe corresponding to the hydrophobic tail and non-coding region hybridizes specifically with the smallest mRNA species. S1 nuclease protection experiments indicate that this region is encoded by exon(s) spliced out from the other mRNAs. Furthermore, our clones that are highly homologous to a published chicken NCAM sequence which codes for putative transmembrane and cytoplasmic domains elsewhere, diverge from it at the presumptive splice junction. It appears thus that alternate use of exons determines whether NCAM proteins with membrane-spanning domains are synthesized.(ABSTRACT TRUNCATED AT 250 WORDS) Images Fig. 3. Fig. 4. Fig. 5. PMID:3595563

  8. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  9. How Do Children Classify Objects?

    ERIC Educational Resources Information Center

    George, Kenneth D.; Dietz, Maureen A.

    1971-01-01

    Except for grade one students, urban and suburban students used similar properties to classify illustrations of bottles containing different amounts of colored liquids. Only in the urban children was there a change in type of property used between grades one and three. (AL)

  10. Rheostatic Regulation of the SERCA/Phospholamban Membrane Protein Complex Using Non-Coding RNA and Single-Stranded DNA oligonucleotides

    PubMed Central

    Soller, Kailey J.; Verardi, Raffaello; Jing, Meng; Abrol, Neha; Yang, Jing; Walsh, Naomi; Vostrikov, Vitaly V.; Robia, Seth L.; Bowser, Michael T.; Veglia, Gianluigi

    2015-01-01

    The membrane protein complex between sarco(endo)plasmic reticulum Ca2+-ATPase (SERCA) and phospholamban (PLN) is a prime therapeutic target for reversing cardiac contractile dysfunctions caused by calcium mishandling. So far, however, efforts to develop drugs specific for this protein complex have failed. Here, we show that non-coding RNAs and single-stranded DNAs (ssDNAs) interact with and regulate the function of the SERCA/PLN complex in a tunable manner. Both in HEK cells expressing the SERCA/PLN complex, as well as in cardiac sarcoplasmic reticulum preparations, these short oligonucleotides bind and reverse PLN’s inhibitory effects on SERCA, increasing the ATPase’s apparent Ca2+ affinity. Solid-state NMR experiments revealed that ssDNA interacts with PLN specifically, shifting the conformational equilibrium of the SERCA/PLN complex from an inhibitory to a non-inhibitory state. Importantly, we achieved rheostatic control of SERCA function by modulating the length of ssDNAs. Since restoration of Ca2+ flux to physiological levels represents a viable therapeutic avenue for cardiomyopathies, our results suggest that oligonucleotide-based drugs could be used to fine-tune SERCA function to counterbalance the extent of the pathological insults. PMID:26292938

  11. Translator, Traitor, Source of Data: Classifying Translations of "Foreign Phrases" as an Awareness-Raising Exercise.

    ERIC Educational Resources Information Center

    Parkinson, Brian

    1998-01-01

    A system for classifying (coding) translations of sentence-length or similar material is presented and illustrated with codings of entries in the "Dictionary of Foreign Phrases and Classical Quotations." Problems in coding are discussed, relating especially to intertextuality, intention, and ownership. The system is intended for pedagogic use, and…

  12. 76 FR 34761 - Classified National Security Information

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-14

    ... Classified National Security Information AGENCY: Marine Mammal Commission. ACTION: Notice. SUMMARY: This... information, as directed by Information Security Oversight Office regulations. FOR FURTHER INFORMATION CONTACT..., ``Classified National Security Information,'' and 32 CFR part 2001, ``Classified National Security...

  13. Is there a best classifier?

    NASA Astrophysics Data System (ADS)

    Richards, John

    2005-10-01

    The question of whether there is a preferred or best classifier to use with remotely sensed data is discussed, focussing on likely results and ease of training. By appealing in part to the No Free Lunch Theorem, it is suggested that there is really no superiority of one well trained algorithm over another, but rather it is the means by which the algorithm is employed - ie. the classification methodology - that often governs the outcomes.

  14. Energy-Efficient Neuromorphic Classifiers.

    PubMed

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

  15. Energy-Efficient Neuromorphic Classifiers.

    PubMed

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered. PMID:27557100

  16. An enhanced MITOMAP with a global mtDNA mutational phylogeny

    PubMed Central

    Ruiz-Pesini, Eduardo; Lott, Marie T.; Procaccio, Vincent; Poole, Jason C.; Brandon, Marty C.; Mishmar, Dan; Yi, Christina; Kreuziger, James; Baldi, Pierre; Wallace, Douglas C.

    2007-01-01

    The MITOMAP () data system for the human mitochondrial genome has been greatly enhanced by the addition of a navigable mutational mitochondrial DNA (mtDNA) phylogenetic tree of ∼3000 mtDNA coding region sequences plus expanded pathogenic mutation tables and a nuclear-mtDNA pseudogene (NUMT) data base. The phylogeny reconstructs the entire mutational history of the human mtDNA, thus defining the mtDNA haplogroups and differentiating ancient from recent mtDNA mutations. Pathogenic mutations are classified by both genotype and phenotype, and the NUMT sequences permits detection of spurious inclusion of pseudogene variants during mutation analysis. These additions position MITOMAP for the implementation of our automated mtDNA sequence analysis system, Mitomaster. PMID:17178747

  17. Sorbitol dehydrogenase. Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end.

    PubMed

    Wen, Y; Bekhor, I

    1993-10-01

    A cDNA clone encoding rat sorbitol dehydrogenase (SDH) was isolated from a rat testis lambda ZAP II cDNA library. The full-length cDNA insert contained 2277 base pairs (bp), starting 182 bp upstream from an ATG codon where translation to the active enzyme SDH is presumed to be initiated. A second ATG codon, however, was found 126 bp upstream, aligned in the same reading frame as that of the active enzyme. Therefore, the coding sequence for SDH can be translated into an additional 42-amino-acid polypeptide linked to the N-terminal amino acid of the enzyme, generating a pre-sorbitol dehydrogenase. The sequence data indicate that the nucleotide environment around this ATG codon is more favorable towards it being the actual open reading frame (ORF) for a pre-SDH than the ATG codon preceding the nucleotide sequence for SDH. Since no known SDH starts with the additional 42 amino acids, it may be that post-translational removal of this polypeptide accompanies the release of the active enzyme. Next, the 3' untranslated region of the cDNA contained a non-coding 1021 bp downstream from the TAA stop codon. The latter sequence included three putative poly(A) signals: one at nucleotides 1362-1367, the second at nucleotides 1465-1470, and the third at nucleotides 2212-2217 [17 bp away from the poly(A) tail]. In addition to the above findings we also report a variance in one of the amino acids in the SDH cDNA sequence. This variance occurs at position 957-960, where threonine is coded for instead of aspartic acid; in the rat testis SDH cDNA, we find the sequence is ACG instead of GAC, as was reported for the rat liver SDH cDNA. Northern-blot hybridization analysis showed that SDH mRNA is a doublet, one band of 4 kb and the other of 2.3-2.4 kb, in both the rat liver and the rat lens, further confirming that the isolated SDH cDNA constituted a full-length cDNA.

  18. 28 CFR 701.14 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Classified information. 701.14 Section... UNDER THE FREEDOM OF INFORMATION ACT § 701.14 Classified information. In processing a request for information that is classified or classifiable under Executive Order 12356 or any other Executive...

  19. 28 CFR 701.14 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified information. 701.14 Section... UNDER THE FREEDOM OF INFORMATION ACT § 701.14 Classified information. In processing a request for information that is classified or classifiable under Executive Order 12356 or any other Executive...

  20. Dimensionality Reduction Through Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)

    1999-01-01

    In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.

  1. Explosive Formulation Code Naming SOP

    SciTech Connect

    Martz, H. E.

    2014-09-19

    The purpose of this SOP is to provide a procedure for giving individual HME formulations code names. A code name for an individual HME formulation consists of an explosive family code, given by the classified guide, followed by a dash, -, and a number. If the formulation requires preparation such as packing or aging, these add additional groups of symbols to the X-ray specimen name.

  2. Human papillomavirus type 16 DNA from a vulvar carcinoma in situ is present as head-to-tail dimeric episomes with a deletion in the non-coding region.

    PubMed

    Kennedy, I M; Simpson, S; Macnab, J C; Clements, J B

    1987-02-01

    A number of genital cancer biopsy samples were screened for the presence of human papillomavirus type 16 (HPV-16) DNA sequences. One of these samples (a vulvar carcinoma in situ) was found to contain more than 100 copies of HPV-16 DNA sequences per cell. Using this tumour DNA, a genomic library was constructed in bacteriophage lambda and the library was screened for recombinant phage containing HPV-16 sequences. Five recombinant phage clones were isolated and their DNA was analysed by restriction endonuclease digestion and blot hybridization. All five recombinants contained two copies of the HPV-16 genome present in a head-to-tail arrangement. The data are consistent with the presence of HPV-16 sequences in the tumour DNA arranged as genomic dimers in a circular episomal configuration. The HPV-16 genomes contained a deletion within the non-coding region, a region which includes the viral origin of DNA replication and transcriptional control sequences. Possible consequences of this deletion for viral replication and transcription are discussed. PMID:3029284

  3. Classifying sex biased congenital anomalies

    SciTech Connect

    Lubinsky, M.S.

    1997-03-31

    The reasons for sex biases in congenital anomalies that arise before structural or hormonal dimorphisms are established has long been unclear. A review of such disorders shows that patterning and tissue anomalies are female biased, and structural findings are more common in males. This suggests different gender dependent susceptibilities to developmental disturbances, with female vulnerabilities focused on early blastogenesis/determination, while males are more likely to involve later organogenesis/morphogenesis. A dual origin for some anomalies explains paradoxical reductions of sex biases with greater severity (i.e., multiple rather than single malformations), presumably as more severe events increase the involvement of an otherwise minor process with opposite biases to those of the primary mechanism. The cause for these sex differences is unknown, but early dimorphisms, such as differences in growth or presence of H-Y antigen, may be responsible. This model provides a useful rationale for understanding and classifying sex-biased congenital anomalies. 42 refs., 7 tabs.

  4. 15 CFR 4.8 - Classified Information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily...

  5. 14 CFR 1216.317 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Classified information. 1216.317 Section 1216.317 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ENVIRONMENTAL QUALITY... Classified information. Environmental assessments and impact statements which contain classified...

  6. 32 CFR 1602.8 - Classifying authority.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Classifying authority. 1602.8 Section 1602.8 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.8 Classifying authority. The term classifying authority refers to any official or board who...

  7. 32 CFR 1602.8 - Classifying authority.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Classifying authority. 1602.8 Section 1602.8 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.8 Classifying authority. The term classifying authority refers to any official or board who...

  8. 32 CFR 1602.8 - Classifying authority.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classifying authority. 1602.8 Section 1602.8 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.8 Classifying authority. The term classifying authority refers to any official or board who...

  9. 32 CFR 1602.8 - Classifying authority.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Classifying authority. 1602.8 Section 1602.8 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.8 Classifying authority. The term classifying authority refers to any official or board who...

  10. 32 CFR 1602.8 - Classifying authority.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Classifying authority. 1602.8 Section 1602.8 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.8 Classifying authority. The term classifying authority refers to any official or board who...

  11. Thermodynamic Post-Processing versus GC-Content Pre-Processing for DNA Codes Satisfying the Hamming Distance and Reverse-Complement Constraints.

    PubMed

    Tulpan, Dan; Smith, Derek H; Montemanni, Roberto

    2014-01-01

    Stochastic, meta-heuristic and linear construction algorithms for the design of DNA strands satisfying Hamming distance and reverse-complement constraints often use a GC-content constraint to pre-process the DNA strands. Since GC-content is a poor predictor of DNA strand hybridization strength the strands can be filtered by post-processing using thermodynamic calculations. An alternative approach is considered here, where the algorithms are modified to remove consideration of GC-content and rely on post-processing alone to obtain large sets of DNA strands with satisfactory melting temperatures. The two approaches (pre-processing GC-content and post-processing melting temperatures) are compared and are shown to be complementary when large DNA sets are desired. In particular, the second approach can give significant improvements when linear constructions are used.

  12. Deciphering the Combinatorial DNA-binding Code of the CCAAT-binding Complex and the Iron-regulatory Basic Region Leucine Zipper (bZIP) Transcription Factor HapX*

    PubMed Central

    Hortschansky, Peter; Ando, Eriko; Tuppatsch, Katja; Arikawa, Hisashi; Kobayashi, Tetsuo; Kato, Masashi; Haas, Hubertus; Brakhage, Axel A.

    2015-01-01

    The heterotrimeric CCAAT-binding complex (CBC) is evolutionarily conserved in eukaryotic organisms, including fungi, plants, and mammals. The CBC consists of three subunits, which are named in the filamentous fungus Aspergillus nidulans HapB, HapC, and HapE. HapX, a fourth CBC subunit, was identified exclusively in fungi, except for Saccharomyces cerevisiae and the closely related Saccharomycotina species. The CBC-HapX complex acts as the master regulator of iron homeostasis. HapX belongs to the class of basic region leucine zipper transcription factors. We demonstrated that the CBC and HapX bind cooperatively to bipartite DNA motifs with a general HapX/CBC/DNA 2:1:1 stoichiometry in a class of genes that are repressed by HapX-CBC in A. nidulans during iron limitation. This combinatorial binding mode requires protein-protein interaction between the N-terminal domain of HapE and the N-terminal CBC binding domain of HapX as well as sequence-specific DNA binding of both the CBC and HapX. Initial binding of the CBC to CCAAT boxes is mandatory for DNA recognition of HapX. HapX specifically targets the minimal motif 5′-GAT-3′, which is located at a distance of 11–12 bp downstream of the respective CCAAT box. Single nucleotide substitutions at the 5′- and 3′-end of the GAT motif as well as different spacing between the CBC and HapX DNA-binding sites revealed a remarkable promiscuous DNA-recognition mode of HapX. This flexible DNA-binding code may have evolved as a mechanism for fine-tuning the transcriptional activity of CBC-HapX at distinct target promoters. PMID:25589790

  13. Molecular cloning of amyloid cDNA derived from mRNA of the Alzheimer disease brain: coding and noncoding regions of the fetal precursor mRNA are expressed in the cortex

    SciTech Connect

    Zain, S.B.; Salim, M.; Chou, W.G.; Sajdel-Sulkowska, E.M.; Majocha, R.E.; Marotta, C.A.

    1988-02-01

    To gain insight into factors associated with the excessive accumulation of ..beta..-amyloid in the Alzheimer disease (AD) brain, the present studies were initiated to distinguish between a unique primary structure of the AD-specific amyloid precursor mRNA vis a vis other determinants that may affect amyloid levels. Previous molecular cloning experiments focused on amyloid derived from sources other than AD cases. In the present work, the authors cloned and characterized amyloid cDNA derived directly from AD brain mRNA. Poly(A)/sup +/ RNA from AD cortices was used for the preparation of lambdagt11 recombinant cDNA libraries. An insert of 1564 nucleotides was isolated that included the ..beta..-amyloid domain and corresponded to 75% of the coding region and approx. = 70% of the 3'-noncoding region of the fetal precursor amyloid cDNA reported by others. On RNA blots, the AD amyloid mRNA consisted of a doublet of 3.2 and 3.4 kilobases. In control and AD cases, the amyloid mRNA levels were nonuniform and were independent of glial-specific mRNA levels. Based on the sequence analysis data, they conclude that a segment of the amyloid gene is expressed in the AD cortex as a high molecular weight precursor mRNA with major coding and 3'-noncoding regions that are identical to the fetal brain gene product.

  14. Molecular cloning of amyloid cDNA derived from mRNA of the Alzheimer disease brain: coding and noncoding regions of the fetal precursor mRNA are expressed in the cortex.

    PubMed Central

    Zain, S B; Salim, M; Chou, W G; Sajdel-Sulkowska, E M; Majocha, R E; Marotta, C A

    1988-01-01

    To gain insight into factors associated with the excessive accumulation of beta-amyloid in the Alzheimer disease (AD) brain, the present studies were initiated to distinguish between a unique primary structure of the AD-specific amyloid precursor mRNA vis a vis other determinants that may affect amyloid levels. Previous molecular cloning experiments focused on amyloid derived from sources other than AD cases. In the present work, we cloned and characterized amyloid cDNA derived directly from AD brain mRNA. Poly(A)+ RNA from AD cortices was used for the preparation of lambda gt11 recombinant cDNA libraries. An insert of 1564 nucleotides was isolated that included the beta-amyloid domain and corresponded to 75% of the coding region and approximately equal to 70% of the 3'-noncoding region of the fetal precursor amyloid cDNA reported by others. On RNA blots, the AD amyloid mRNA consisted of a doublet of 3.2 and 3.4 kilobases. In control and AD cases, the amyloid mRNA levels were nonuniform and were independent of glial-specific mRNA levels. Based on the sequence analysis data, we conclude that a segment of the amyloid gene is expressed in the AD cortex as a high molecular weight precursor mRNA with major coding and 3'-noncoding regions that are identical to the fetal brain gene product. Images PMID:2893379

  15. Error minimizing algorithms for nearest eighbor classifiers

    SciTech Connect

    Porter, Reid B; Hush, Don; Zimmer, G. Beate

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.

  16. Comparing different classifiers for automatic age estimation.

    PubMed

    Lanitis, Andreas; Draganova, Chrisina; Christodoulou, Chris

    2004-02-01

    We describe a quantitative evaluation of the performance of different classifiers in the task of automatic age estimation. In this context, we generate a statistical model of facial appearance, which is subsequently used as the basis for obtaining a compact parametric description of face images. The aim of our work is to design classifiers that accept the model-based representation of unseen images and produce an estimate of the age of the person in the corresponding face image. For this application, we have tested different classifiers: a classifier based on the use of quadratic functions for modeling the relationship between face model parameters and age, a shortest distance classifier, and artificial neural network based classifiers. We also describe variations to the basic method where we use age-specific and/or appearance specific age estimation methods. In this context, we use age estimation classifiers for each age group and/or classifiers for different clusters of subjects within our training set. In those cases, part of the classification procedure is devoted to choosing the most appropriate classifier for the subject/age range in question, so that more accurate age estimates can be obtained. We also present comparative results concerning the performance of humans and computers in the task of age estimation. Our results indicate that machines can estimate the age of a person almost as reliably as humans.

  17. Isolation and expression of a novel chick G-protein cDNA coding for a G alpha i3 protein with a G alpha 0 N-terminus.

    PubMed Central

    Kilbourne, E J; Galper, J B

    1994-01-01

    We have cloned cDNAs coding for G-protein alpha subunits from a chick brain cDNA library. Based on sequence similarity to G-protein alpha subunits from other eukaryotes, one clone was designated G alpha i3. A second clone, G alpha i3-o, was identical to the G alpha i3 clone over 932 bases on the 3' end. The 5' end of G alpha i3-o, however, contained an alternative sequence in which the first 45 amino acids coded for are 100% identical to the conserved N-terminus of G alpha o from species such as rat, mouse, human, bovine and hamster. Both clones were found to be expressed in all tissues studied. The unusual alpha o-alpha i3-like G-protein chimera, G alpha i3-o, was found to be expressed at significantly lower levels than G alpha i3. In vitro transcription and translation of the G alpha i3-o cDNA clone gave a protein of approx. 41 kDa which stably bound guanosine 5'-[gamma-thio]triphosphate. G alpha i3-o appears to be the first G-protein alpha subunit cloned which contains ends that are homologous to two different alpha subunit isoforms, G alpha o and G alpha i3. Images Figure 4 Figure 5 Figure 6 Figure 7 PMID:8297335

  18. The changing epitome of species identification - DNA barcoding.

    PubMed

    Ajmal Ali, M; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M A; Pandey, Arun K; Lee, Joongku

    2014-07-01

    The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The 'DNA barcodes' show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007

  19. The changing epitome of species identification – DNA barcoding

    PubMed Central

    Ajmal Ali, M.; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M.A.; Pandey, Arun K.; Lee, Joongku

    2014-01-01

    The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The ‘DNA barcodes’ show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007

  20. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Pollara, Fabrizio; Hamkins, Jon; Dolinar, Sam; Andrews, Ken; Divsalar, Dariush

    2006-01-01

    This viewgraph presentation reviews uplink coding. The purpose and goals of the briefing are (1) Show a plan for using uplink coding and describe benefits (2) Define possible solutions and their applicability to different types of uplink, including emergency uplink (3) Concur with our conclusions so we can embark on a plan to use proposed uplink system (4) Identify the need for the development of appropriate technology and infusion in the DSN (5) Gain advocacy to implement uplink coding in flight projects Action Item EMB04-1-14 -- Show a plan for using uplink coding, including showing where it is useful or not (include discussion of emergency uplink coding).

  1. 28 CFR 700.14 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Classified information. 700.14 Section... INFORMATION OF THE OFFICE OF INDEPENDENT COUNSEL Protection of Privacy and Access to Individual Records Under the Privacy Act of 1974 § 700.14 Classified information. In processing a request for access to...

  2. 28 CFR 700.14 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified information. 700.14 Section... INFORMATION OF THE OFFICE OF INDEPENDENT COUNSEL Protection of Privacy and Access to Individual Records Under the Privacy Act of 1974 § 700.14 Classified information. In processing a request for access to...

  3. 28 CFR 16.7 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... processing a request for information that is classified under Executive Order 12958 (3 CFR, 1996 Comp., p... 28 Judicial Administration 1 2013-07-01 2013-07-01 false Classified information. 16.7 Section 16.7 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR...

  4. 28 CFR 16.44 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 1 2013-07-01 2013-07-01 false Classified information. 16.44 Section 16.44 Judicial Administration DEPARTMENT OF JUSTICE PRODUCTION OR DISCLOSURE OF MATERIAL OR INFORMATION... information. In processing a request for access to a record containing information that is classified...

  5. A fuzzy classifier system for process control

    NASA Technical Reports Server (NTRS)

    Karr, C. L.; Phillips, J. C.

    1994-01-01

    A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.

  6. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental...

  7. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 28 Judicial Administration 2 2012-07-01 2012-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental...

  8. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental...

  9. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental...

  10. 28 CFR 61.8 - Classified proposals.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 28 Judicial Administration 2 2014-07-01 2014-07-01 false Classified proposals. 61.8 Section 61.8 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR IMPLEMENTING THE NATIONAL ENVIRONMENTAL POLICY ACT Implementing Procedures § 61.8 Classified proposals. If an environmental...

  11. 6 CFR 5.24 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Classified information. 5.24 Section 5.24 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY DISCLOSURE OF RECORDS AND INFORMATION Privacy Act § 5.24 Classified information. In processing a request for access to a...

  12. 6 CFR 5.7 - Classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... classified under Executive Order 12958 (3 CFR, 1996 Comp., p. 333) or any other executive order, the... 6 Domestic Security 1 2010-01-01 2010-01-01 false Classified information. 5.7 Section 5.7 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY DISCLOSURE OF RECORDS AND...

  13. Deconvolution When Classifying Noisy Data Involving Transformations

    PubMed Central

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2013-01-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online. PMID:23606778

  14. Measuring Diagnoses: ICD Code Accuracy

    PubMed Central

    O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M

    2005-01-01

    Objective To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. Data Sources/Study Setting The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. Study Design/Methods We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Principle Findings Main error sources along the “patient trajectory” include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the “paper trail” include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. Conclusions By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways. PMID:16178999

  15. The 55K protein on the 5' termini of adenovirus type 2 DNA is unrelated to virus-coded candidate transformation proteins (E1-53K, E1-40K-50K) and DNA-binding proteins (E2-42K/47K/73K).

    PubMed

    Green, M; Wold, W S; Brackmann, K H; Cartas, M A

    1979-09-01

    A polypeptide of 55,000 daltons (55K) is linked, probably covalently, to the K' termini of adenovirus type 2 DNA. The 55K polypeptide is synthesized during early stages of infection (T. Yamashita, M. Arens, and M. Green, J. Virol. 30: 497-507, 1979) and thus may function in viral DNA replication, gene regulation, or cell transformation. Several virus-coded early polypeptides have been identified that could correspond to the terminal 55K, including the E1-40K-50K and E1-53K candidate transformation polypeptides and the E2-42K/47K/73K single-stranded DNA-binding polypeptide. We show here that two-dimensional tryptic [35S]methionine-peptide maps of the terminal 55K differ completely from [35S]methionine-peptide maps of four related E1-40K-50K polypeptides, the E1-53K, and the related E2-42K, E2-47K, and E2-73K polypeptides. We conclude that the terminal 55K polypeptide does not correspond to any of the known virus-coded early polypeptides.

  16. The Saccharomyces cerevisiae MGT1 DNA repair methyltransferase gene: its promoter and entire coding sequence, regulation and in vivo biological functions.

    PubMed Central

    Xiao, W; Samson, L

    1992-01-01

    We previously cloned a yeast DNA fragment that, when fused with the bacterial lacZ promoter, produced O6-methylguanine DNA repair methyltransferase (MGT1) activity and alkylation resistance in Escherichia coli (Xiao et al., EMBO J. 10,2179). Here we describe the isolation of the entire MGT1 gene and its promoter by sequence directed chromosome integration and walking. The MGT1 promoter was fused to a lacZ reporter gene to study how MGT1 expression is controlled. MGT1 is not induced by alkylating agents, nor is it induced by other DNA damaging agents such as UV light. However, deletion analysis defined an upstream repression sequence, whose removal dramatically increased basal level gene expression. The polypeptide deduced from the complete MGT1 sequence contained 18 more N-terminal amino acids than that previously determined; the role of these 18 amino acids, which harbored a potential nuclear localization signal, was explored. The MGT1 gene was also cloned under the GAL1 promoter, so that MTase levels could be manipulated, and we examined MGT1 function in a MTase deficient yeast strain (mgt1). The extent of resistance to both alkylation-induced mutation and cell killing directly correlated with MTase levels. Finally we show that mgt1 S.cerevisiae has a higher rate of spontaneous mutation than wild type cells, indicating that there is an endogenous source of DNA alkylation damage in these eukaryotic cells and that one of the in vivo roles of MGT1 is to limit spontaneous mutations. PMID:1641326

  17. Logarithmic learning for generalized classifier neural network.

    PubMed

    Ozyildirim, Buse Melis; Avci, Mutlu

    2014-12-01

    Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network.

  18. Integrating heterogeneous classifier ensembles for EMG signal decomposition based on classifier agreement.

    PubMed

    Rasheed, Sarbast; Stashuk, Daniel W; Kamel, Mohamed S

    2010-05-01

    In this paper, we present a design methodology for integrating heterogeneous classifier ensembles by employing a diversity-based hybrid classifier fusion approach, whose aggregator module consists of two classifier combiners, to achieve an improved classification performance for motor unit potential classification during electromyographic (EMG) signal decomposition. Following the so-called overproduce and choose strategy to classifier ensemble combination, the developed system allows the construction of a large set of base classifiers, and then automatically chooses subsets of classifiers to form candidate classifier ensembles for each combiner. The system exploits kappa statistic diversity measure to design classifier teams through estimating the level of agreement between base classifier outputs. The pool of base classifiers consists of different kinds of classifiers: the adaptive certainty-based, the adaptive fuzzy k -NN, and the adaptive matched template filter classifiers; and utilizes different types of features. Performance of the developed system was evaluated using real and simulated EMG signals, and was compared with the performance of the constituent base classifiers. Across the EMG signal datasets used, the developed system had better average classification performance overall, especially in terms of reducing classification errors. For simulated signals of varying intensity, the developed system had an average correct classification rate CCr of 93.8% and an error rate Er of 2.2% compared to 93.6% and 3.2%, respectively, for the best base classifier in the ensemble. For simulated signals with varying amounts of shape and/or firing pattern variability, the developed system had a CCr of 89.1% with an Er of 4.7% compared to 86.3% and 5.6%, respectively, for the best classifier. For real signals, the developed system had a CCr of 89.4% with an Er of 3.9% compared to 84.6% and 7.1%, respectively, for the best classifier.

  19. Sharing code.

    PubMed

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing.

  20. Haplogrouping mitochondrial DNA sequences in Legal Medicine/Forensic Genetics.

    PubMed

    Bandelt, Hans-Jürgen; van Oven, Mannis; Salas, Antonio

    2012-11-01

    Haplogrouping refers to the classification of (partial) mitochondrial DNA (mtDNA) sequences into haplogroups using the current knowledge of the worldwide mtDNA phylogeny. Haplogroup assignment of mtDNA control-region sequences assists in the focused comparison with closely related complete mtDNA sequences and thus serves two main goals in forensic genetics: first is the a posteriori quality analysis of sequencing results and second is the prediction of relevant coding-region sites for confirmation or further refinement of haplogroup status. The latter may be important in forensic casework where discrimination power needs to be as high as possible. However, most articles published in forensic genetics perform haplogrouping only in a rudimentary or incorrect way. The present study features PhyloTree as the key tool for assigning control-region sequences to haplogroups and elaborates on additional Web-based searches for finding near-matches with complete mtDNA genomes in the databases. In contrast, none of the automated haplogrouping tools available can yet compete with manual haplogrouping using PhyloTree plus additional Web-based searches, especially when confronted with artificial recombinants still present in forensic mtDNA datasets. We review and classify the various attempts at haplogrouping by using a multiplex approach or relying on automated haplogrouping. Furthermore, we re-examine a few articles in forensic journals providing mtDNA population data where appropriate haplogrouping following PhyloTree immediately highlights several kinds of sequence errors.

  1. Artificial neural networks for classifying olfactory signals.

    PubMed

    Linder, R; Pöppl, S J

    2000-01-01

    For practical applications, artificial neural networks have to meet several requirements: Mainly they should learn quick, classify accurate and behave robust. Programs should be user-friendly and should not need the presence of an expert for fine tuning diverse learning parameters. The present paper demonstrates an approach using an oversized network topology, adaptive propagation (APROP), a modified error function, and averaging outputs of four networks described for the first time. As an example, signals from different semiconductor gas sensors of an electronic nose were classified. The electronic nose smelt different types of edible oil with extremely different a-priori-probabilities. The fully-specified neural network classifier fulfilled the above mentioned demands. The new approach will be helpful not only for classifying olfactory signals automatically but also in many other fields in medicine, e.g. in data mining from medical databases.

  2. How Is Acute Lymphocytic Leukemia Classified?

    MedlinePlus

    ... How is acute lymphocytic leukemia treated? How is acute lymphocytic leukemia classified? Most types of cancers are assigned numbered ... ALL are now named as follows: B-cell ALL Early pre-B ALL (also called pro-B ...

  3. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ..., (50 U.S.C. 401) Executive Order 12958 provides the only basis for classifying information. Information...) Top Secret. This classification shall be applied only to information the unauthorized disclosure...

  4. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ..., (50 U.S.C. 401) Executive Order 12958 provides the only basis for classifying information. Information...) Top Secret. This classification shall be applied only to information the unauthorized disclosure...

  5. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ..., (50 U.S.C. 401) Executive Order 12958 provides the only basis for classifying information. Information...) Top Secret. This classification shall be applied only to information the unauthorized disclosure...

  6. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ..., (50 U.S.C. 401) Executive Order 12958 provides the only basis for classifying information. Information...) Top Secret. This classification shall be applied only to information the unauthorized disclosure...

  7. 5 CFR 1312.4 - Classified designations.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ..., (50 U.S.C. 401) Executive Order 12958 provides the only basis for classifying information. Information...) Top Secret. This classification shall be applied only to information the unauthorized disclosure...

  8. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA

    PubMed Central

    Jiang, Huiyan; Zhao, Di; Zheng, Ruiping; Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867

  9. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA.

    PubMed

    Jiang, Huiyan; Zhao, Di; Zheng, Ruiping; Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time.

  10. Molecular cloning of a cDNA coding biliary glycoprotein I: Primary structure of a glycoprotein immunologically crossreactive with carcinoembryonic antigen

    SciTech Connect

    Hinoda, Y.; Neumaier, M.; Hefta, S.A.; Drzeniek, Z.; Wagener, C.; Shively, L.; Hefta, L.J.F.; Shively, J.E.; Paxton, R.J.

    1988-09-01

    The authors have isolated and sequenced four overlapping cDNA clones from a normal adult human colon library, which together gave the entire nucleotide sequence for biliary glycoprotein I (BGPI). BGPI is a member of the carcinoembryonic antigen (CEA) gene family, which is a subfamily in the immunoglobulin gene superfamily. The deduced amino acid sequence of the combined clones for BGP I revealed a 34-residue leader sequence followed by a 108-residue N-terminal domain, a 178-residue immunoglobulin-like domain, a 108-residue region specific to BGP I, a 24-residue transmembrane domain, and a 35-residue cytoplasmic domain. The nucleotide sequence of BGP I exhibited greater than 80% identity with CEA and nonspecific crossreacting antigen (NCA) in the leader peptide, N-terminal domain, and immunoglobulin-like domain. They propose that BGP I diverged from NCA by acquiring an immunoglobulin-like domain substantially different from the domains found in NCA or CEA and also a new cytoplasmic domain. The latter feature should result in a substantially different membrane anchorage mechanism of BGP I compared to CEA, which lacks the cytoplasmic domain and is anchored via a phosphatidylinositol-glycan structure. Protein structural analysis of BGP I isolated from human bile revealed a blocked N terminus, 129 amino acids of internal sequence that are in agreement with the translated cDNA sequence, and five glycosylation sites in the peptides sequenced.

  11. Isolation of a cDNA coding for L-galactono-gamma-lactone dehydrogenase, an enzyme involved in the biosynthesis of ascorbic acid in plants. Purification, characterization, cDNA cloning, and expression in yeast.

    PubMed

    Ostergaard, J; Persiau, G; Davey, M W; Bauw, G; Van Montagu, M

    1997-11-28

    L-Galactono-gamma-lactone dehydrogenase (EC 1.3.2.3; GLDase), an enzyme that catalyzes the final step in the biosynthesis of L-ascorbic acid was purified 1693-fold from a mitochondrial extract of cauliflower (Brassica oleracea, var. botrytis) to apparent homogeneity with an overall yield of 1.1%. The purification procedure consisted of anion exchange, hydrophobic interaction, gel filtration, and fast protein liquid chromatography. The enzyme had a molecular mass of 56 kDa estimated by gel filtration chromatography and SDS-polyacrylamide gel electrophoresis and showed a pH optimum for activity between pH 8.0 and 8.5, with an apparent Km of 3.3 mM for L-galactono-gamma-lactone. Based on partial peptide sequence information, polymerase chain reaction fragments were isolated and used to screen a cauliflower cDNA library from which a cDNA encoding GLDase was isolated. The deduced mature GLDase contained 509 amino acid residues with a predicted molecular mass of 57,837 Da. Expression of the cDNA in yeast produced a biologically active protein displaying GLDase activity. Furthermore, we identified a substrate for the enzyme in cauliflower extract, which co-eluted with L-galactono-gamma-lactone by high-performance liquid chromatography, suggesting that this compound is a naturally occurring precursor of L-ascorbic acid biosynthesis in vivo.

  12. The coding region of the UFGT gene is a source of diagnostic SNP markers that allow single-locus DNA genotyping for the assessment of cultivar identity and ancestry in grapevine (Vitis vinifera L.)

    PubMed Central

    2013-01-01

    Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902

  13. Three tomato genes code for heat stress transcription factors with a region of remarkable homology to the DNA-binding domain of the yeast HSF.

    PubMed Central

    Scharf, K D; Rose, S; Zott, W; Schöffl, F; Nover, L; Schöff, F

    1990-01-01

    Heat stress (hs) treatment of cell cultures of Lycopersicon peruvianum (Lp, tomato) results in activation of preformed transcription factor(s) (HSF) binding to the heat stress consensus element (HSE). Using appropriate synthetic HSE oligonucleotides, three types of clones with potential HSE binding domains were isolated from a tomato lambda gt11 expression library by DNA-ligand screening. One of the potential HSF genes is constitutively expressed, the other two are hs-induced. Sequence comparison defines a single domain of approximately 90 amino acid residues common to all three genes and to the HSE--binding domain of the yeast HSF. The domain is flanked by proline residues and characterized by two long overlapping repeats. We speculate that the derived consensus sequence is also representative for other eukaryotic HSF and that the existence of several different HSF is not unique to plants. Images Fig. 1. Fig. 2. Fig. 3. Fig. 4. PMID:2148291

  14. Association of a specific cationic peroxidase isozyme with maize stress and disease resistance responses, genetic identification, and identification of a cDNA coding for the isozyme.

    PubMed

    Dowd, Patrick F; Johnson, Eric T

    2005-06-01

    The presence of a pI 9.0 cationic peroxidase isozyme from milk stage pericarp of six susceptible and five resistant inbreds was correlated significantly with previously reported field data on percentage infection by Aspergillus flavus in the inbreds and their hybrids. The isozyme was constitutively expressed in some additional maize tissues and lines examined, and frequently induced by mechanical damage, heat shock, Fusarium proliferatum, and/or Bacillus subtilis in other lines tested. Native/IEF two-dimensional electrophoresis identified the isozyme as the previously genetically identified px5. A cDNA clone expressed in black Mexican sweet (BMS) maize cell cultures produced the pI 9.0 isozyme. In addition to potential use in marker-assisted breeding, enhanced expression of this cationic peroxidase through breeding or genetic engineering may lead to enhanced disease or insect resistance.

  15. Cloning and sequence analysis of the coding sequence of β-actin cDNA from the Chinese alligator and suitable internal reference primers from the β-actin gene.

    PubMed

    Zhu, H N; Zhang, S Z; Zhou, Y K; Wang, C L; Wu, X B

    2015-01-01

    β-Actin is an essential component of the cytoskeleton and is stably expressed in various tissues of animals, thus, it is commonly used as an internal reference for gene expression studies. In this study, a 1731-bp fragment of β-actin cDNA from Alligator sinensis was obtained using the homology cloning technique. Sequence analysis showed that this fragment contained the complete coding sequence of the β-actin gene (1128 bp), encoding 375 amino acids. The amino acid sequence of β-actin is highly conserved and its nucleotide sequence is slightly variable. Multiple alignment analyses showed that the nucleotide sequence of the β-actin gene from A. sinensis is very similar to sequences from birds, with 94-95% identity. Ten pairs of primers with different product sizes and different annealing temperatures were screened by PCR amplification, agarose gel electrophoresis, and DNA sequencing, and could be used as internal reference primers in gene expression studies. This study expands our knowledge of β-actin gene phylogenetic evolution and provides a basis for quantitative gene expression studies in A. sinensis. PMID:26505364

  16. Phylogenetic footprinting of non-coding RNA: hammerhead ribozyme sequences in a satellite DNA family of Dolichopoda cave crickets (Orthoptera, Rhaphidophoridae)

    PubMed Central

    2010-01-01

    Background The great variety in sequence, length, complexity, and abundance of satellite DNA has made it difficult to ascribe any function to this genome component. Recent studies have shown that satellite DNA can be transcribed and be involved in regulation of chromatin structure and gene expression. Some satellite DNAs, such as the pDo500 sequence family in Dolichopoda cave crickets, have a catalytic hammerhead (HH) ribozyme structure and activity embedded within each repeat. Results We assessed the phylogenetic footprints of the HH ribozyme within the pDo500 sequences from 38 different populations representing 12 species of Dolichopoda. The HH region was significantly more conserved than the non-hammerhead (NHH) region of the pDo500 repeat. In addition, stems were more conserved than loops. In stems, several compensatory mutations were detected that maintain base pairing. The core region of the HH ribozyme was affected by very few nucleotide substitutions and the cleavage position was altered only once among 198 sequences. RNA folding of the HH sequences revealed that a potentially active HH ribozyme can be found in most of the Dolichopoda populations and species. Conclusions The phylogenetic footprints suggest that the HH region of the pDo500 sequence family is selected for function in Dolichopoda cave crickets. However, the functional role of HH ribozymes in eukaryotic organisms is unclear. The possible functions have been related to trans cleavage of an RNA target by a ribonucleoprotein and regulation of gene expression. Whether the HH ribozyme in Dolichopoda is involved in similar functions remains to be investigated. Future studies need to demonstrate how the observed nucleotide changes and evolutionary constraint have affected the catalytic efficiency of the hammerhead. PMID:20047671

  17. Self-correcting 100-font classifier

    NASA Astrophysics Data System (ADS)

    Baird, Henry S.; Nagy, George

    1994-03-01

    We have developed a practical scheme to take advantage of local typeface homogeneity to improve the accuracy of a character classifier. Given a polyfont classifier which is capable of recognizing any of 100 typefaces moderately well, our method allows it to specialize itself automatically to the single -- but otherwise unknown -- typeface it is reading. Essentially, the classifier retrains itself after examining some of the images, guided at first by the preset classification boundaries of the given classifier, and later by the behavior of the retrained classifier. Experimental trials on 6.4 M pseudo-randomly distorted images show that the method improves on 95 of the 100 typefaces. It reduces the error rate by a factor of 2.5, averaged over 100 typefaces, when applied to an alphabet of 80 ASCII characters printed at ten point and digitized at 300 pixels/inch. This self-correcting method complements, and does not hinder, other methods for improving OCR accuracy, such as linguistic contextual analysis.

  18. What are the differences between Bayesian classifiers and mutual-information classifiers?

    PubMed

    Hu, Bao-Gang

    2014-02-01

    In this paper, both Bayesian and mutual-information classifiers are examined for binary classifications with or without a reject option. The general decision rules are derived for Bayesian classifiers with distinctions on error types and reject types. A formal analysis is conducted to reveal the parameter redundancy of cost terms when abstaining classifications are enforced. The redundancy implies an intrinsic problem of nonconsistency for interpreting cost terms. If no data are given to the cost terms, we demonstrate the weakness of Bayesian classifiers in class-imbalanced classifications. On the contrary, mutual-information classifiers are able to provide an objective solution from the given data, which shows a reasonable balance among error types and reject types. Numerical examples of using two types of classifiers are given for confirming the differences, including the extremely class-imbalanced cases. Finally, we briefly summarize the Bayesian and mutual-information classifiers in terms of their application advantages and disadvantages, respectively.

  19. A nonparametric classifier for unsegmented text

    NASA Astrophysics Data System (ADS)

    Nagy, George; Joshi, Ashutosh; Krishnamoorthy, Mukkai; Lin, Yu; Lopresti, Daniel P.; Mehta, Shashank; Seth, Sharad

    2003-12-01

    Symbolic Indirect Correlation (SIC) is a new classification method for unsegmented patterns. SIC requires two levels of comparisons. First, the feature sequences from an unknown query signal and a known multi-pattern reference signal are matched. Then, the order of the matched features is compared with the order of matches between every lexicon symbol-string and the reference string in the lexical domain. The query is classified according to the best matching lexicon string in the second comparison. Accuracy increases as classified feature-and-symbol strings are added to the reference string.

  20. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  1. Coding design for error correcting output codes based on perceptron

    NASA Astrophysics Data System (ADS)

    Zhou, Jin-Deng; Wang, Xiao-Dan; Zhou, Hong-Jian; Cui, Yong-Hua; Jing, Sun

    2012-05-01

    It is known that error-correcting output codes (ECOC) is a common way to model multiclass classification problems, in which the research of encoding based on data is attracting more and more attention. We propose a method for learning ECOC with the help of a single-layered perception neural network. To achieve this goal, the code elements of ECOC are mapped to the weights of network for the given decoding strategy, and an object function with the constrained weights is used as a cost function of network. After the training, we can obtain a coding matrix including lots of subgroups of class. Experimental results on artificial data and University of California Irvine with logistic linear classifier and support vector machine as the binary learner show that our scheme provides better performance of classification with shorter length of coding matrix than other state-of-the-art encoding strategies.

  2. Using classifier fusion to improve the performance of multiclass classification problems

    NASA Astrophysics Data System (ADS)

    Lynch, Robert; Willett, Peter

    2013-05-01

    The problem of multiclass classification is often modeled by breaking it down into a collection of binary classifiers, as opposed to jointly modeling all classes with a single primary classifier. Various methods can be found in the literature for decomposing the multiclass problem into a collection of binary classifiers. Typical algorithms that are studied here include each versus all remaining (EVAR), each versus all individually (EVAI), and output correction coding (OCC). With each of these methods a classifier fusion based decision rule is formulated utilizing the various binary classifiers to determine the correct classification of an unknown data point. For example, with EVAR the binary classifier with maximum output is chosen. For EVAI, the correct class is chosen using a majority voting rule, and with OCC a comparison algorithm based minimum Hamming distance metric is used. In this paper, it is demonstrated how these various methods perform utilizing the Bayesian Reduction Algorithm (BDRA) as the primary classifier. BDRA is a discrete data classification method that quantizes and reduces the dimensionality of feature data for best classification performance. In this case, BDRA is used to not only train the appropriate binary classifier pairs, but it is also used to train on the discrete classifier outputs to formulate the correct classification decision of unknown data points. In this way, it is demonstrated how to predict which binary classification based algorithm method (i.e., EVAR, EVAI, or OCC) performs best with BDRA. Experimental results are shown with real data sets taken from the Knowledge Extraction based on Evolutionary Learning (KEEL) and University of California at Irvine (UCI) Repositories of classifier Databases. In general, and for the data sets considered, it is shown that the best classification method, based on performance with unlabeled test observations, can be predicted form performance on labeled training data. Specifically, the best

  3. AGDEX: A System for Classifying, Indexing, and Filing Agricultural Publications. Revised Edition.

    ERIC Educational Resources Information Center

    Miller, Howard L.; Woodin, Ralph J.

    This document provides an introduction to and instructions for the use of AGDEX, a comprehensive numeric filing system to classify and organize a wide variety of agricultural publications. The index is subdivided and color coded according to the following categories: (1) field crops; (2) horticulture; (3) forestry; (4) animal science; (5) soils;…

  4. FY05 LDRD Fianl Report Investigation of AAA+ protein machines that participate in DNA replication, recombination, and in response to DNA damage LDRD Project Tracking Code: 04-LW-049

    SciTech Connect

    Sawicka, D; de Carvalho-Kavanagh, M S; Barsky, D; Venclovas, C

    2006-12-04

    The AAA+ proteins are remarkable macromolecules that are able to self-assemble into nanoscale machines. These protein machines play critical roles in many cellular processes, including the processes that manage a cell's genetic material, but the mechanism at the molecular level has remained elusive. We applied computational molecular modeling, combined with advanced sequence analysis and available biochemical and genetic data, to structurally characterize eukaryotic AAA+ proteins and the protein machines they form. With these models we have examined intermolecular interactions in three-dimensions (3D), including both interactions between the components of the AAA+ complexes and the interactions of these protein machines with their partners. These computational studies have provided new insights into the molecular structure and the mechanism of action for AAA+ protein machines, thereby facilitating a deeper understanding of processes involved in DNA metabolism.

  5. Visual Classifier Training for Text Document Retrieval.

    PubMed

    Heimerl, F; Koch, S; Bosch, H; Ertl, T

    2012-12-01

    Performing exhaustive searches over a large number of text documents can be tedious, since it is very hard to formulate search queries or define filter criteria that capture an analyst's information need adequately. Classification through machine learning has the potential to improve search and filter tasks encompassing either complex or very specific information needs, individually. Unfortunately, analysts who are knowledgeable in their field are typically not machine learning specialists. Most classification methods, however, require a certain expertise regarding their parametrization to achieve good results. Supervised machine learning algorithms, in contrast, rely on labeled data, which can be provided by analysts. However, the effort for labeling can be very high, which shifts the problem from composing complex queries or defining accurate filters to another laborious task, in addition to the need for judging the trained classifier's quality. We therefore compare three approaches for interactive classifier training in a user study. All of the approaches are potential candidates for the integration into a larger retrieval system. They incorporate active learning to various degrees in order to reduce the labeling effort as well as to increase effectiveness. Two of them encompass interactive visualization for letting users explore the status of the classifier in context of the labeled documents, as well as for judging the quality of the classifier in iterative feedback loops. We see our work as a step towards introducing user controlled classification methods in addition to text search and filtering for increasing recall in analytics scenarios involving large corpora.

  6. Shape and Function in Hmong Classifier Choices

    ERIC Educational Resources Information Center

    Sakuragi, Toshiyuki; Fuller, Judith W.

    2013-01-01

    This study examined classifiers in the Hmong language with a particular focus on gaining insights into the underlying cognitive process of categorization. Forty-three Hmong speakers participated in three experiments. In the first experiment, designed to verify the previously postulated configurational (saliently one-dimensional, saliently…

  7. Classifying and quantifying basins of attraction

    SciTech Connect

    Sprott, J. C.; Xiong, Anda

    2015-08-15

    A scheme is proposed to classify the basins for attractors of dynamical systems in arbitrary dimensions. There are four basic classes depending on their size and extent, and each class can be further quantified to facilitate comparisons. The calculation uses a Monte Carlo method and is applied to numerous common dissipative chaotic maps and flows in various dimensions.

  8. The Community; A Classified, Annotated Bibliography.

    ERIC Educational Resources Information Center

    Payne, Raymond, Comp.; Bailey, Wilfrid C., Comp.

    This is a classified retrospective bibliography of 839 items on the community (about 140 are annotated) from rural sociology and agricultural economics departments and sections, agricultural experiment stations, extension services, and related agencies. Items are categorized as follows: bibliography and reference lists; location and delineation of…

  9. Classifying the Context Clues in Children's Text

    ERIC Educational Resources Information Center

    Dowds, Susan J. Parault; Haverback, Heather Rogers; Parkinson, Meghan M.

    2016-01-01

    This study aimed to determine which types of context clues exist in children's texts and whether it is possible for experts to identify reliably those clues. Three experienced coders used Ames' clue set as a foundation for a system to classify context clues in children's text. Findings showed that the adjustments to Ames' system resulted in 15…

  10. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  11. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  12. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  13. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  14. 32 CFR 651.13 - Classified actions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) National Environmental Policy Act and the Decision Process..., AR 380-5 (Department of the Army Information Security Program) will be followed. (b) Classification... makers in accordance with AR 380-5. (d) When classified information is such an integral part of...

  15. A Proposed System for Classifying Research Universities.

    ERIC Educational Resources Information Center

    Anderson, Robert C.

    A system of classifying research unviersities is proposed based on quantitative criteria. Data from several studies were used to develop a list of 57 leading U.S. research universities. The Carnegie Commission's 1973 and 1976 classification of "Research Universities I" and the Academy for Educational Development's listing are presented, along with…

  16. A DNA Vaccine Coding for the Brucella Outer Membrane Protein 31 Confers Protection against B. melitensis and B. ovis Infection by Eliciting a Specific Cytotoxic Response

    PubMed Central

    Cassataro, Juliana; Velikovsky, Carlos A.; de la Barrera, Silvia; Estein, Silvia M.; Bruno, Laura; Bowden, Raúl; Pasquevich, Karina A.; Fossati, Carlos A.; Giambartolomei, Guillermo H.

    2005-01-01

    The development of an effective subunit vaccine against brucellosis is a research area of intense interest. The outer membrane proteins (Omps) of Brucella spp. have been extensively characterized as potential immunogenic and protective antigens. This study was conducted to evaluate the immunogenicity and protective efficacy of the B. melitensis Omp31 gene cloned in the pCI plasmid (pCIOmp31). Immunization of BALB/c mice with pCIOmp31 conferred protection against B. ovis and B. melitensis infection. Mice vaccinated with pCIOmp31 developed a very weak humoral response, and in vitro stimulation of their splenocytes with recombinant Omp31 did not induced the secretion of gamma interferon. Splenocytes from Omp31-vaccinated animals induced a specific cytotoxic-T-lymphocyte activity, which leads to the in vitro lysis of Brucella-infected macrophages. pCIOmp31 immunization elicited mainly CD8+ T cells, which mediate cytotoxicity via perforins, but also CD4+ T cells, which mediate lysis via the Fas-FasL pathway. In vivo depletion of T-cell subsets showed that the pCIOmp31-induced protection against Brucella infection is mediated predominantly by CD8+ T cells, although CD4+T cells also contribute. Our results demonstrate that the Omp31 DNA vaccine induces cytotoxic responses that have the potential to contribute to protection against Brucella infection. The protective response could be related to the induction of CD8+ T cells that eliminate Brucella-infected cells via the perforin pathway. PMID:16177328

  17. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  18. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

    PubMed

    Hua, Wei; Wang, Jiasong; Zhao, Jian

    2014-01-01

    Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method.

  19. Transcriptome-based functional classifiers for direct immunotoxicity.

    PubMed

    Shao, Jia; Berger, Laura F; Hendriksen, Peter J M; Peijnenburg, Ad A C M; van Loveren, Henk; Volger, Oscar L

    2014-03-01

    Current screening methods for direct immunotoxic chemicals are mainly based on general toxicity studies with rodents. The present study aimed to identify transcriptome-based functional classifiers that can eventually be exploited for the development of in vitro screening assays for direct immunotoxicity. To this end, a toxicogenomics approach was applied in which gene expression changes in human Jurkat lymphoblastic T cells were investigated in response to a wide range of compounds, including direct immunotoxicants, immunosuppressive drugs, and non-immunotoxic control chemicals. On the basis of DNA microarray data previously obtained by the exposure of Jurkat cells to 31 test compounds (Shao et al. in Toxicol Sci 135(2):328-346, 2013), we identified a set of 93 genes, of which 80 were significantly regulated (|numerical ratio| ≥1.62) by at least three compounds and the other 13 genes were significantly regulated by either one single compound or compound class. A total of 28 most differentially regulated genes were selected for qRT-PCR verification using a training set of 44 compounds consisting of the above-mentioned 31 compounds (23 immunotoxic and 8 non-immunotoxic) and 13 additional immunotoxicants. Good correlation between the results of microarray and qRT-PCR (Pearson's correlation, R ≥ 0.69) was found for 27 out of the 28 genes. Redundancy analysis of these 27 potential classifiers led to a final set of 25 genes. To assess the performance of these genes, Jurkat cells were exposed to 20 additional compounds (external verification set) followed by qRT-PCR. The classifier set of 25 genes gave a good performance in the external verification: accuracy 85 %, true positive rate (sensitivity) 88 %, and true negative rate (specificity) 67 %. Furthermore, on the basis of the gene ontology annotation of the 25 classifier genes, the immunotoxicants examined in this study could be categorized into distinct functional subclasses. In conclusion, we have identified and

  20. DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.

    PubMed

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.

  1. Orthopedics coding and funding.

    PubMed

    Baron, S; Duclos, C; Thoreux, P

    2014-02-01

    The French tarification à l'activité (T2A) prospective payment system is a financial system in which a health-care institution's resources are based on performed activity. Activity is described via the PMSI medical information system (programme de médicalisation du système d'information). The PMSI classifies hospital cases by clinical and economic categories known as diagnosis-related groups (DRG), each with an associated price tag. Coding a hospital case involves giving as realistic a description as possible so as to categorize it in the right DRG and thus ensure appropriate payment. For this, it is essential to understand what determines the pricing of inpatient stay: namely, the code for the surgical procedure, the patient's principal diagnosis (reason for admission), codes for comorbidities (everything that adds to management burden), and the management of the length of inpatient stay. The PMSI is used to analyze the institution's activity and dynamism: change on previous year, relation to target, and comparison with competing institutions based on indicators such as the mean length of stay performance indicator (MLS PI). The T2A system improves overall care efficiency. Quality of care, however, is not presently taken account of in the payment made to the institution, as there are no indicators for this; work needs to be done on this topic.

  2. Disassembly and Sanitization of Classified Matter

    SciTech Connect

    Stockham, Dwight J.; Saad, Max P.

    2008-01-15

    The Disassembly Sanitization Operation (DSO) process was implemented to support weapon disassembly and disposition by using recycling and waste minimization measures. This process was initiated by treaty agreements and reconfigurations within both the DOD and DOE Complexes. The DOE is faced with disassembling and disposing of a huge inventory of retired weapons, components, training equipment, spare parts, weapon maintenance equipment, and associated material. In addition, regulations have caused a dramatic increase in the need for information required to support the handling and disposition of these parts and materials. In the past, huge inventories of classified weapon components were required to have long-term storage at Sandia and at many other locations throughout the DoE Complex. These materials are placed in onsite storage unit due to classification issues and they may also contain radiological and/or hazardous components. Since no disposal options exist for this material, the only choice was long-term storage. Long-term storage is costly and somewhat problematic, requiring a secured storage area, monitoring, auditing, and presenting the potential for loss or theft of the material. Overall recycling rates for materials sent through the DSO process have enabled 70 to 80% of these components to be recycled. These components are made of high quality materials and once this material has been sanitized, the demand for the component metals for recycling efforts is very high. The DSO process for NGPF, classified components established the credibility of this technique for addressing the long-term storage requirements of the classified weapons component inventory. The success of this application has generated interest from other Sandia organizations and other locations throughout the complex. Other organizations are requesting the help of the DSO team and the DSO is responding to these requests by expanding its scope to include Work-for- Other projects. For example

  3. MCNP code

    SciTech Connect

    Cramer, S.N.

    1984-01-01

    The MCNP code is the major Monte Carlo coupled neutron-photon transport research tool at the Los Alamos National Laboratory, and it represents the most extensive Monte Carlo development program in the United States which is available in the public domain. The present code is the direct descendent of the original Monte Carlo work of Fermi, von Neumaum, and Ulam at Los Alamos in the 1940s. Development has continued uninterrupted since that time, and the current version of MCNP (or its predecessors) has always included state-of-the-art methods in the Monte Carlo simulation of radiation transport, basic cross section data, geometry capability, variance reduction, and estimation procedures. The authors of the present code have oriented its development toward general user application. The documentation, though extensive, is presented in a clear and simple manner with many examples, illustrations, and sample problems. In addition to providing the desired results, the output listings give a a wealth of detailed information (some optional) concerning each state of the calculation. The code system is continually updated to take advantage of advances in computer hardware and software, including interactive modes of operation, diagnostic interrupts and restarts, and a variety of graphical and video aids.

  4. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  5. Semantic Features for Classifying Referring Search Terms

    SciTech Connect

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.; Bell, Eric B.; Marshall, Eric J.; Gregory, Michelle L.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from query text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.

  6. Detection of Fundus Lesions Using Classifier Selection

    NASA Astrophysics Data System (ADS)

    Nagayoshi, Hiroto; Hiramatsu, Yoshitaka; Sako, Hiroshi; Himaga, Mitsutoshi; Kato, Satoshi

    A system for detecting fundus lesions caused by diabetic retinopathy from fundus images is being developed. The system can screen the images in advance in order to reduce the inspection workload on doctors. One of the difficulties that must be addressed in completing this system is how to remove false positives (which tend to arise near blood vessels) without decreasing the detection rate of lesions in other areas. To overcome this difficulty, we developed classifier selection according to the position of a candidate lesion, and we introduced new features that can distinguish true lesions from false positives. A system incorporating classifier selection and these new features was tested in experiments using 55 fundus images with some lesions and 223 images without lesions. The results of the experiments confirm the effectiveness of the proposed system, namely, degrees of sensitivity and specificity of 98% and 81%, respectively.

  7. Training a CAD classifier with correlated data

    NASA Astrophysics Data System (ADS)

    Dundar, Murat; Krishnapuram, Balaji; Wolf, Matthias; Lakare, Sarang; Bogoni, Luca; Bi, Jinbo; Rao, R. Bharat

    2007-03-01

    Most methods for classifier design assume that the training samples are drawn independently and identically from an unknown data generating distribution (i.i.d.), although this assumption is violated in several real life problems. Relaxing this i.i.d. assumption, we develop training algorithms for the more realistic situation where batches or sub-groups of training samples may have internal correlations, although the samples from different batches may be considered to be uncorrelated; we also consider the extension to cases with hierarchical--i.e. higher order--correlation structure between batches of training samples. After describing efficient algorithms that scale well to large datasets, we provide some theoretical analysis to establish their validity. Experimental results from real-life Computer Aided Detection (CAD) problems indicate that relaxing the i.i.d. assumption leads to statistically significant improvements in the accuracy of the learned classifier.

  8. Classifying bed inclination using pressure images.

    PubMed

    Baran Pouyan, M; Ostadabbas, S; Nourani, M; Pompeo, M

    2014-01-01

    Pressure ulcer is one of the most prevalent problems for bed-bound patients in hospitals and nursing homes. Pressure ulcers are painful for patients and costly for healthcare systems. Accurate in-bed posture analysis can significantly help in preventing pressure ulcers. Specifically, bed inclination (back angle) is a factor contributing to pressure ulcer development. In this paper, an efficient methodology is proposed to classify bed inclination. Our approach uses pressure values collected from a commercial pressure mat system. Then, by applying a number of image processing and machine learning techniques, the approximate degree of bed is estimated and classified. The proposed algorithm was tested on 15 subjects with various sizes and weights. The experimental results indicate that our method predicts bed inclination in three classes with 80.3% average accuracy.

  9. Comparing cosmic web classifiers using information theory

    NASA Astrophysics Data System (ADS)

    Leclercq, Florent; Lavaux, Guilhem; Jasche, Jens; Wandelt, Benjamin

    2016-08-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  10. Classifying Land Cover Using Spectral Signature

    NASA Astrophysics Data System (ADS)

    Alawiye, F. S.

    2012-12-01

    Studying land cover has become increasingly important as countries try to overcome the destruction of wetlands; its impact on local climate due to seasonal variation, radiation balance, and deteriorating environmental quality. In this investigation, we have been studying the spectral signatures of the Jamaica Bay wetland area based on remotely sensed satellite input data from LANDSAT TM and ASTER. We applied various remote sensing techniques to generate classified land cover output maps. Our classifiers relied on input from both the remote sensing and in-situ spectral field data. Based upon spectral separability and data collected in the field, a supervised and unsupervised classification was carried out. First results suggest good agreement between the land cover units mapped and those observed in the field.

  11. Bayes classifiers for imbalanced traffic accidents datasets.

    PubMed

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents.

  12. Chromatin States Accurately Classify Cell Differentiation Stages

    PubMed Central

    Larson, Jessica L.; Yuan, Guo-Cheng

    2012-01-01

    Gene expression is controlled by the concerted interactions between transcription factors and chromatin regulators. While recent studies have identified global chromatin state changes across cell-types, it remains unclear to what extent these changes are co-regulated during cell-differentiation. Here we present a comprehensive computational analysis by assembling a large dataset containing genome-wide occupancy information of 5 histone modifications in 27 human cell lines (including 24 normal and 3 cancer cell lines) obtained from the public domain, followed by independent analysis at three different representations. We classified the differentiation stage of a cell-type based on its genome-wide pattern of chromatin states, and found that our method was able to identify normal cell lines with nearly 100% accuracy. We then applied our model to classify the cancer cell lines and found that each can be unequivocally classified as differentiated cells. The differences can be in part explained by the differential activities of three regulatory modules associated with embryonic stem cells. We also found that the “hotspot” genes, whose chromatin states change dynamically in accordance to the differentiation stage, are not randomly distributed across the genome but tend to be embedded in multi-gene chromatin domains, and that specialized gene clusters tend to be embedded in stably occupied domains. PMID:22363642

  13. Optimization of short amino acid sequences classifier

    NASA Astrophysics Data System (ADS)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  14. DNA Nanotechnology-- Architectures Designed with DNA

    NASA Astrophysics Data System (ADS)

    Han, Dongran

    As the genetic information storage vehicle, deoxyribonucleic acid (DNA) molecules are essential to all known living organisms and many viruses. It is amazing that such a large amount of information about how life develops can be stored in these tiny molecules. Countless scientists, especially some biologists, are trying to decipher the genetic information stored in these captivating molecules. Meanwhile, another group of researchers, nanotechnologists in particular, have discovered that the unique and concise structural features of DNA together with its information coding ability can be utilized for nano-construction efforts. This idea culminated in the birth of the field of DNA nanotechnology which is the main topic of this dissertation. The ability of rationally designed DNA strands to self-assemble into arbitrary nanostructures without external direction is the basis of this field. A series of novel design principles for DNA nanotechnology are presented here, from topological DNA nanostructures to complex and curved DNA nanostructures, from pure DNA nanostructures to hybrid RNA/DNA nanostructures. As one of the most important and pioneering fields in controlling the assembly of materials (both DNA and other materials) at the nanoscale, DNA nanotechnology is developing at a dramatic speed and as more and more construction approaches are invented, exciting advances will emerge in ways that we may or may not predict.

  15. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    PubMed Central

    Arshad, Sannia; Rho, Seungmin

    2014-01-01

    We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302

  16. Learnability of min-max pattern classifiers

    NASA Astrophysics Data System (ADS)

    Yang, Ping-Fai; Maragos, Petros

    1991-11-01

    This paper introduces the class of thresholded min-max functions and studies their learning under the probably approximately correct (PAC) model introduced by Valiant. These functions can be used as pattern classifiers of both real-valued and binary-valued feature vectors. They are a lattice-theoretic generalization of Boolean functions and are also related to three-layer perceptrons and morphological signal operators. Several subclasses of the thresholded min- max functions are shown to be learnable under the PAC model.

  17. 70. PRIMARY MILL AND CLASSIFIER No. 2 FROM NORTHWEST. MILL ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    70. PRIMARY MILL AND CLASSIFIER No. 2 FROM NORTHWEST. MILL DISCHARGED INTO LAUNDER WHICH PIERCED THE SIDE OF THE CLASSIFIER PAN. WOOD LAUNDER WITHIN CLASSIFIER VISIBLE (FILLED WITH DEBRIS). HORIZONTAL WOOD PLANKING BEHIND MILL IS FEED BOX. MILL SOLUTION PIPING RUNS ALONG BASE OF WEST SIDE OF CLASSIFIER. - Bald Mountain Gold Mill, Nevada Gulch at head of False Bottom Creek, Lead, Lawrence County, SD

  18. A Systematic Comparison of Supervised Classifiers

    PubMed Central

    Amancio, Diego Raphael; Comin, Cesar Henrique; Casanova, Dalcimar; Travieso, Gonzalo; Bruno, Odemir Martinez; Rodrigues, Francisco Aparecido; da Fontoura Costa, Luciano

    2014-01-01

    Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM). In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration. PMID:24763312

  19. Objectively classifying Southern Hemisphere extratropical cyclones

    NASA Astrophysics Data System (ADS)

    Catto, Jennifer

    2016-04-01

    There has been a long tradition in attempting to separate extratropical cyclones into different classes depending on their cloud signatures, airflows, synoptic precursors, or upper-level flow features. Depending on these features, the cyclones may have different impacts, for example in their precipitation intensity. It is important, therefore, to understand how the distribution of different cyclone classes may change in the future. Many of the previous classifications have been performed manually. In order to be able to evaluate climate models and understand how extratropical cyclones might change in the future, we need to be able to use an automated method to classify cyclones. Extratropical cyclones have been identified in the Southern Hemisphere from the ERA-Interim reanalysis dataset with a commonly used identification and tracking algorithm that employs 850 hPa relative vorticity. A clustering method applied to large-scale fields from ERA-Interim at the time of cyclone genesis (when the cyclone is first detected), has been used to objectively classify identified cyclones. The results are compared to the manual classification of Sinclair and Revell (2000) and the four objectively identified classes shown in this presentation are found to match well. The relative importance of diabatic heating in the clusters is investigated, as well as the differing precipitation characteristics. The success of the objective classification shows its utility in climate model evaluation and climate change studies.

  20. Cross-classified occupational exposure data.

    PubMed

    Jones, Rachael M; Burstyn, Igor

    2016-09-01

    We demonstrate the regression analysis of exposure determinants using cross-classified random effects in the context of lead exposures resulting from blasting surfaces in advance of painting. We had three specific objectives for analysis of the lead data, and observed: (1) high within-worker variability in personal lead exposures, explaining 79% of variability; (2) that the lead concentration outside of half-mask respirators was 2.4-fold higher than inside supplied-air blasting helmets, suggesting that the exposure reduction by blasting helmets may be lower than expected by the Assigned Protection Factor; and (3) that lead concentrations at fixed area locations in containment were not associated with personal lead exposures. In addition, we found that, on average, lead exposures among workers performing blasting and other activities was 40% lower than among workers performing only blasting. In the process of obtaining these analyses objectives, we determined that the data were non-hierarchical: repeated exposure measurements were collected for a worker while the worker was a member of several groups, or cross-classified among groups. Since the worker is a member of multiple groups, the exposure data do not adhere to the traditionally assumed hierarchical structure. Forcing a hierarchical structure on these data led to similar within-group and between-group variability, but decreased precision in the estimate of effect of work activity on lead exposure. We hope hygienists and exposure assessors will consider non-hierarchical models in the design and analysis of exposure assessments. PMID:27029937

  1. Mercury⊕: An evidential reasoning image classifier

    NASA Astrophysics Data System (ADS)

    Peddle, Derek R.

    1995-12-01

    MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.

  2. Classifying multispectral data by neural networks

    NASA Technical Reports Server (NTRS)

    Telfer, Brian A.; Szu, Harold H.; Kiang, Richard K.

    1993-01-01

    Several energy functions for synthesizing neural networks are tested on 2-D synthetic data and on Landsat-4 Thematic Mapper data. These new energy functions, designed specifically for minimizing misclassification error, in some cases yield significant improvements in classification accuracy over the standard least mean squares energy function. In addition to operating on networks with one output unit per class, a new energy function is tested for binary encoded outputs, which result in smaller network sizes. The Thematic Mapper data (four bands were used) is classified on a single pixel basis, to provide a starting benchmark against which further improvements will be measured. Improvements are underway to make use of both subpixel and superpixel (i.e. contextual or neighborhood) information in tile processing. For single pixel classification, the best neural network result is 78.7 percent, compared with 71.7 percent for a classical nearest neighbor classifier. The 78.7 percent result also improves on several earlier neural network results on this data.

  3. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  4. Intelligent query by humming system based on score level fusion of multiple classifiers

    NASA Astrophysics Data System (ADS)

    Pyo Nam, Gi; Thu Trang Luong, Thi; Ha Nam, Hyun; Ryoung Park, Kang; Park, Sung-Joo

    2011-12-01

    Recently, the necessity for content-based music retrieval that can return results even if a user does not know information such as the title or singer has increased. Query-by-humming (QBH) systems have been introduced to address this need, as they allow the user to simply hum snatches of the tune to find the right song. Even though there have been many studies on QBH, few have combined multiple classifiers based on various fusion methods. Here we propose a new QBH system based on the score level fusion of multiple classifiers. This research is novel in the following three respects: three local classifiers [quantized binary (QB) code-based linear scaling (LS), pitch-based dynamic time warping (DTW), and LS] are employed; local maximum and minimum point-based LS and pitch distribution feature-based LS are used as global classifiers; and the combination of local and global classifiers based on the score level fusion by the PRODUCT rule is used to achieve enhanced matching accuracy. Experimental results with the 2006 MIREX QBSH and 2009 MIR-QBSH corpus databases show that the performance of the proposed method is better than that of single classifier and other fusion methods.

  5. Perfect teleportation and superdense coding with W states

    SciTech Connect

    Agrawal, Pankaj; Pati, Arun

    2006-12-15

    True tripartite entanglement of the state of a system of three qubits can be classified on the basis of stochastic local operations and classical communications. Such states can be classified into two categories: GHZ states and W states. It is known that GHZ states can be used for teleportation and superdense coding, but the prototype W state cannot be. However, we show that there is a class of W states that can be used for perfect teleportation and superdense coding.

  6. [Uracil-DNA glycosylases].

    PubMed

    Pytel, Dariusz; Słupianek, Artur; Ksiazek, Dominika; Skórski, Tomasz; Błasiak, Janusz

    2008-01-01

    Uracil is one of four nitrogen bases, most frequently found in normal RNA. Uracyl can be found also in DNA as a result of enzymatic or non-enzymatic deamination of cytosine as well as misincorporation of dUMP instead of dTMP during DNA replication. Uracil from DNA can be removed by DNA repair enzymes with apirymidine site as an intermediate. However, if uracil is not removed from DNA a pair C:G in parental DNA can be changed into a T:A pair in the daughter DNA molecule. Therefore, uracil in DNA may lead to a mutation. Uracil in DNA, similarly to thymine, forms energetically most favorable hydrogen bonds with adenine, therefore uracil does not change the coding properties of DNA. Uracil in DNA is recognized by uracil DNA glycosylase (UDGs), which initiates DNA base excision repair, leading to removing of uracil from DNA and replacing it by thymine or cytosine, when arose as a result of cytosine deamination. Eukaryotes have at least four nuclear UDGs: UNG2, SMUG1, TDG i MBD4, while UNG1 operates in the mitochondrium. UNG2 is involved in DNA repair associated with DNA replication and interacts with PCNA and RPA proteins. Uracil can also be an intermediate product in the process of antigen-dependent antibody diversification in B lymphocytes. Enzymatic deamination of viral DNA by host cells can be a defense mechanism against viral infection, including HIV-1. UNG2, MBD4 and TDG glycosylases may cooperate with mismatch repair proteins and TDG can be involved in nucleotide excision repair system.

  7. Chilean Pitavia more closely related to Oceania and Old World Rutaceae than to Neotropical groups: evidence from two cpDNA non-coding regions, with a new subfamilial classification of the family

    PubMed Central

    Groppo, Milton; Kallunki, Jacquelyn A.; Pirani, José Rubens; Antonelli, Alexandre

    2012-01-01

    Abstract The position of the plant genus Pitavia within an infrafamilial phylogeny of Rutaceae (rue, or orange family) was investigated with the use of two non-coding regions from cpDNA, the trnL-trnF region and the rps16 intron. The only species of the genus, Pitavia punctata Molina, is restricted to the temperate forests of the Coastal Cordillera of Central-Southern Chile and threatened by loss of habitat. The genus traditionally has been treated as part of tribe Zanthoxyleae (subfamily Rutoideae) where it constitutes the monogeneric tribe Pitaviinae. This tribe and genus are characterized by fruits of 1 to 4 fleshy drupelets, unlike the dehiscent fruits typical of the subfamily. Fifty-five taxa of Rutaceae, representing 53 genera (nearly one-third of those in the family) and all subfamilies, tribes, and almost all subtribes of the family were included. Parsimony and Bayesian inference were used to infer the phylogeny; six taxa of Meliaceae, Sapindaceae, and Simaroubaceae, all members of Sapindales, were also used as out-groups. Results from both analyses were congruent and showed Pitavia as sister to Flindersia and Lunasia, both genera with species scattered through Australia, Philippines, Moluccas, New Guinea and the Malayan region, and phylogenetically far from other Neotropical Rutaceae, such as the Galipeinae (Galipeeae, Rutoideae) and Pteleinae (Toddalieae, former Toddalioideae). Additionally, a new circumscription of the subfamilies of Rutaceae is presented and discussed. Only two subfamilies (both monophyletic) are recognized: Cneoroideae (including Dictyolomatoideae, Spathelioideae, Cneoraceae, and Ptaeroxylaceae) and Rutoideae (including not only traditional Rutoideae but also Aurantioideae, Flindersioideae, and Toddalioideae). As a consequence, Aurantioideae (Citrus and allies) is reduced to tribal rank as Aurantieae. PMID:23717188

  8. A cognitive approach to classifying perceived behaviors

    NASA Astrophysics Data System (ADS)

    Benjamin, Dale Paul; Lyons, Damian

    2010-04-01

    This paper describes our work on integrating distributed, concurrent control in a cognitive architecture, and using it to classify perceived behaviors. We are implementing the Robot Schemas (RS) language in Soar. RS is a CSP-type programming language for robotics that controls a hierarchy of concurrently executing schemas. The behavior of every RS schema is defined using port automata. This provides precision to the semantics and also a constructive means of reasoning about the behavior and meaning of schemas. Our implementation uses Soar operators to build, instantiate and connect port automata as needed. Our approach is to use comprehension through generation (similar to NLSoar) to search for ways to construct port automata that model perceived behaviors. The generality of RS permits us to model dynamic, concurrent behaviors. A virtual world (Ogre) is used to test the accuracy of these automata. Soar's chunking mechanism is used to generalize and save these automata. In this way, the robot learns to recognize new behaviors.

  9. Learning algorithms for stack filter classifiers

    SciTech Connect

    Porter, Reid B; Hush, Don; Zimmer, Beate G

    2009-01-01

    Stack Filters define a large class of increasing filter that is used widely in image and signal processing. The motivations for using an increasing filter instead of an unconstrained filter have been described as: (1) fast and efficient implementation, (2) the relationship to mathematical morphology and (3) more precise estimation with finite sample data. This last motivation is related to methods developed in machine learning and the relationship was explored in an earlier paper. In this paper we investigate this relationship by applying Stack Filters directly to classification problems. This provides a new perspective on how monotonicity constraints can help control estimation and approximation errors, and also suggests several new learning algorithms for Boolean function classifiers when they are applied to real-valued inputs.

  10. Classifying antiarrhythmic actions: by facts or speculation.

    PubMed

    Vaughan Williams, E M

    1992-11-01

    Classification of antiarrhythmic actions is reviewed in the context of the results of the Cardiac Arrhythmia Suppression Trials, CAST 1 and 2. Six criticisms of the classification recently published (The Sicilian Gambit) are discussed in detail. The alternative classification, when stripped of speculative elements, is shown to be similar to the original classification. Claims that the classification failed to predict the efficacy of antiarrhythmic drugs for the selection of appropriate therapy have been tested by an example. The antiarrhythmic actions of cibenzoline were classified in 1980. A detailed review of confirmatory experiments and clinical trials during the past decade shows that predictions made at the time agree with subsequent results. Classification of the effects drugs actually have on functioning cardiac tissues provides a rational basis for finding the preferred treatment for a particular arrhythmia in accordance with the diagnosis.

  11. Classifying prion and prion-like phenomena.

    PubMed

    Harbi, Djamel; Harrison, Paul M

    2014-01-01

    The universe of prion and prion-like phenomena has expanded significantly in the past several years. Here, we overview the challenges in classifying this data informatically, given that terms such as "prion-like", "prion-related" or "prion-forming" do not have a stable meaning in the scientific literature. We examine the spectrum of proteins that have been described in the literature as forming prions, and discuss how "prion" can have a range of meaning, with a strict definition being for demonstration of infection with in vitro-derived recombinant prions. We suggest that although prion/prion-like phenomena can largely be apportioned into a small number of broad groups dependent on the type of transmissibility evidence for them, as new phenomena are discovered in the coming years, a detailed ontological approach might be necessary that allows for subtle definition of different "flavors" of prion / prion-like phenomena.

  12. A headband for classifying human postures.

    PubMed

    Aloqlah, Mohammed; Lahiji, Rosa R; Loparo, Kenneth A; Mehregany, Mehran

    2010-01-01

    a real-time method using only accelerometer data is developed for classifying basic human static postures, namely sitting, standing, and lying, as well as dynamic transitions between them. The algorithm uses discrete wavelet transform (DWT) in combination with a fuzzy logic inference system (FIS). Data from a single three-axis accelerometer integrated into a wearable headband is transmitted wirelessly, collected and analyzed in real time on a laptop computer, to extract two sets of features for posture classification. The received acceleration signals are decomposed using the DWT to extract the dynamic features; changes in the smoothness of the signal that reflect a transition between postures are detected at finer DWT scales. FIS then uses the previous posture transition and DWT-extracted features to determine the static postures. PMID:21097190

  13. Classifying supernovae using only galaxy data

    SciTech Connect

    Foley, Ryan J.; Mandel, Kaisey

    2013-12-01

    We present a new method for probabilistically classifying supernovae (SNe) without using SN spectral or photometric data. Unlike all previous studies to classify SNe without spectra, this technique does not use any SN photometry. Instead, the method relies on host-galaxy data. We build upon the well-known correlations between SN classes and host-galaxy properties, specifically that core-collapse SNe rarely occur in red, luminous, or early-type galaxies. Using the nearly spectroscopically complete Lick Observatory Supernova Search sample of SNe, we determine SN fractions as a function of host-galaxy properties. Using these data as inputs, we construct a Bayesian method for determining the probability that an SN is of a particular class. This method improves a common classification figure of merit by a factor of >2, comparable to the best light-curve classification techniques. Of the galaxy properties examined, morphology provides the most discriminating information. We further validate this method using SN samples from the Sloan Digital Sky Survey and the Palomar Transient Factory. We demonstrate that this method has wide-ranging applications, including separating different subclasses of SNe and determining the probability that an SN is of a particular class before photometry or even spectra can. Since this method uses completely independent data from light-curve techniques, there is potential to further improve the overall purity and completeness of SN samples and to test systematic biases of the light-curve techniques. Further enhancements to the host-galaxy method, including additional host-galaxy properties, combination with light-curve methods, and hybrid methods, should further improve the quality of SN samples from past, current, and future transient surveys.

  14. Classifying Chimpanzee Facial Expressions Using Muscle Action

    PubMed Central

    Parr, Lisa A.; Waller, Bridget M.; Vick, Sarah J.; Bard, Kim A.

    2010-01-01

    The Chimpanzee Facial Action Coding System (ChimpFACS) is an objective, standardized observational tool for measuring facial movement in chimpanzees based on the well-known human Facial Action Coding System (FACS; P. Ekman & W. V. Friesen, 1978). This tool enables direct structural comparisons of facial expressions between humans and chimpanzees in terms of their common underlying musculature. Here the authors provide data on the first application of the ChimpFACS to validate existing categories of chimpanzee facial expressions using discriminant functions analyses. The ChimpFACS validated most existing expression categories (6 of 9) and, where the predicted group memberships were poor, the authors discuss potential problems with ChimpFACS and/or existing categorizations. The authors also report the prototypical movement configurations associated with these 6 expression categories. For all expressions, unique combinations of muscle movements were identified, and these are illustrated as peak intensity prototypical expression configurations. Finally, the authors suggest a potential homology between these prototypical chimpanzee expressions and human expressions based on structural similarities. These results contribute to our understanding of the evolution of emotional communication by suggesting several structural homologies between the facial expressions of chimpanzees and humans and facilitating future research. PMID:17352572

  15. Development of an Algorithm to Classify Colonoscopy Indication from Coded Health Care Data

    PubMed Central

    Adams, Kenneth F.; Johnson, Eric A.; Chubak, Jessica; Kamineni, Aruna; Doubeni, Chyke A.; Buist, Diana S.M.; Williams, Andrew E.; Weinmann, Sheila; Doria-Rose, V. Paul; Rutter, Carolyn M.

    2015-01-01

    Introduction: Electronic health data are potentially valuable resources for evaluating colonoscopy screening utilization and effectiveness. The ability to distinguish screening colonoscopies from exams performed for other purposes is critical for research that examines factors related to screening uptake and adherence, and the impact of screening on patient outcomes, but distinguishing between these indications in secondary health data proves challenging. The objective of this study is to develop a new and more accurate algorithm for identification of screening colonoscopies using electronic health data. Methods: Data from a case-control study of colorectal cancer with adjudicated colonoscopy indication was used to develop logistic regression-based algorithms. The proposed algorithms predict the probability that a colonoscopy was indicated for screening, with variables selected for inclusion in the models using the Least Absolute Shrinkage and Selection Operator (LASSO). Results: The algorithms had excellent classification accuracy in internal validation. The primary, restricted model had AUC= 0.94, sensitivity=0.91, and specificity=0.82. The secondary, extended model had AUC=0.96, sensitivity=0.88, and specificity=0.90. Discussion: The LASSO approach enabled estimation of parsimonious algorithms that identified screening colonoscopies with high accuracy in our study population. External validation is needed to replicate these results and to explore the performance of these algorithms in other settings. PMID:26290883

  16. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    PubMed Central

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862

  17. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    PubMed

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  18. Classifying gauge anomalies through symmetry-protected trivial orders and classifying gravitational anomalies through topological orders

    NASA Astrophysics Data System (ADS)

    Wen, Xiao-Gang

    2013-08-01

    In this paper, we systematically study gauge anomalies in bosonic and fermionic weak-coupling gauge theories with gauge group G (which can be continuous or discrete) in d space-time dimensions. We show a very close relation between gauge anomalies for gauge group G and symmetry-protected trivial (SPT) orders (also known as symmetry-protected topological (SPT) orders) with symmetry group G in one-higher dimension. The SPT phases are classified by group cohomology class Hd+1(G,R/Z). Through a more careful consideration, we argue that the gauge anomalies are described by the elements in Free[Hd+1(G,R/Z)]⊕Hπ˙d+1(BG,R/Z). The well known Adler-Bell-Jackiw anomalies are classified by the free part of Hd+1(G,R/Z) (denoted as Free[Hd+1(G,R/Z)]). We refer to other kinds of gauge anomalies beyond Adler-Bell-Jackiw anomalies as non-ABJ gauge anomalies, which include Witten SU(2) global gauge anomalies. We introduce a notion of π-cohomology group, Hπ˙d+1(BG,R/Z), for the classifying space BG, which is an Abelian group and include Tor[Hd+1(G,R/Z)] and topological cohomology group Hd+1(BG,R/Z) as subgroups. We argue that Hπ˙d+1(BG,R/Z) classifies the bosonic non-ABJ gauge anomalies and partially classifies fermionic non-ABJ anomalies. Using the same approach that shows gauge anomalies to be connected to SPT phases, we can also show that gravitational anomalies are connected to topological orders (i.e., patterns of long-range entanglement) in one-higher dimension.

  19. DNA structure and function.

    PubMed

    Travers, Andrew; Muskhelishvili, Georgi

    2015-06-01

    The proposal of a double-helical structure for DNA over 60 years ago provided an eminently satisfying explanation for the heritability of genetic information. But why is DNA, and not RNA, now the dominant biological information store? We argue that, in addition to its coding function, the ability of DNA, unlike RNA, to adopt a B-DNA structure confers advantages both for information accessibility and for packaging. The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling under torsional stress. We review recent evidence suggesting that DNA supercoiling, particularly that generated by DNA translocases, is a major driver of gene regulation and patterns of chromosomal gene organization, and in its guise as a promoter of DNA packaging enables DNA to act as an energy store to facilitate the passage of translocating enzymes such as RNA polymerase.

  20. DNA barcoding for plants.

    PubMed

    de Vere, Natasha; Rich, Tim C G; Trinder, Sarah A; Long, Charlotte

    2015-01-01

    DNA barcoding uses specific regions of DNA in order to identify species. Initiatives are taking place around the world to generate DNA barcodes for all groups of living organisms and to make these data publically available in order to help understand, conserve, and utilize the world's biodiversity. For land plants the core DNA barcode markers are two sections of coding regions within the chloroplast, part of the genes, rbcL and matK. In order to create high quality databases, each plant that is DNA barcoded needs to have a herbarium voucher that accompanies the rbcL and matK DNA sequences. The quality of the DNA sequences, the primers used, and trace files should also be accessible to users of the data. Multiple individuals should be DNA barcoded for each species in order to check for errors and allow for intraspecific variation. The world's herbaria provide a rich resource of already preserved and identified material and these can be used for DNA barcoding as well as by collecting fresh samples from the wild. These protocols describe the whole DNA barcoding process, from the collection of plant material from the wild or from the herbarium, how to extract and amplify the DNA, and how to check the quality of the data after sequencing.

  1. MLgsc: A Maximum-Likelihood General Sequence Classifier.

    PubMed

    Junier, Thomas; Hervé, Vincent; Wunderlin, Tina; Junier, Pilar

    2015-01-01

    We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.

  2. A random forest classifier for lymph diseases.

    PubMed

    Azar, Ahmad Taher; Elshazly, Hanaa Ismail; Hassanien, Aboul Ella; Elkorany, Abeer Mohamed

    2014-02-01

    Machine learning-based classification techniques provide support for the decision-making process in many areas of health care, including diagnosis, prognosis, screening, etc. Feature selection (FS) is expected to improve classification performance, particularly in situations characterized by the high data dimensionality problem caused by relatively few training examples compared to a large number of measured features. In this paper, a random forest classifier (RFC) approach is proposed to diagnose lymph diseases. Focusing on feature selection, the first stage of the proposed system aims at constructing diverse feature selection algorithms such as genetic algorithm (GA), Principal Component Analysis (PCA), Relief-F, Fisher, Sequential Forward Floating Search (SFFS) and the Sequential Backward Floating Search (SBFS) for reducing the dimension of lymph diseases dataset. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the RFC for efficient classification. It was observed that GA-RFC achieved the highest classification accuracy of 92.2%. The dimension of input feature space is reduced from eighteen to six features by using GA. PMID:24290902

  3. Mining, compressing and classifying with extensible motifs

    PubMed Central

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2006-01-01

    Background Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time. Results In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction. Conclusion Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences. PMID:16722593

  4. Monocular precrash vehicle detection: features and classifiers.

    PubMed

    Sun, Zehang; Bebis, George; Miller, Ronald

    2006-07-01

    Robust and reliable vehicle detection from images acquired by a moving vehicle (i.e., on-road vehicle detection) is an important problem with applications to driver assistance systems and autonomous, self-guided vehicles. The focus of this work is on the issues of feature extraction and classification for rear-view vehicle detection. Specifically, by treating the problem of vehicle detection as a two-class classification problem, we have investigated several different feature extraction methods such as principal component analysis, wavelets, and Gabor filters. To evaluate the extracted features, we have experimented with two popular classifiers, neural networks and support vector machines (SVMs). Based on our evaluation results, we have developed an on-board real-time monocular vehicle detection system that is capable of acquiring grey-scale images, using Ford's proprietary low-light camera, achieving an average detection rate of 10 Hz. Our vehicle detection algorithm consists of two main steps: a multiscale driven hypothesis generation step and an appearance-based hypothesis verification step. During the hypothesis generation step, image locations where vehicles might be present are extracted. This step uses multiscale techniques not only to speed up detection, but also to improve system robustness. The appearance-based hypothesis verification step verifies the hypotheses using Gabor features and SVMs. The system has been tested in Ford's concept vehicle under different traffic conditions (e.g., structured highway, complex urban streets, and varying weather conditions), illustrating good performance. PMID:16830921

  5. Monocular precrash vehicle detection: features and classifiers.

    PubMed

    Sun, Zehang; Bebis, George; Miller, Ronald

    2006-07-01

    Robust and reliable vehicle detection from images acquired by a moving vehicle (i.e., on-road vehicle detection) is an important problem with applications to driver assistance systems and autonomous, self-guided vehicles. The focus of this work is on the issues of feature extraction and classification for rear-view vehicle detection. Specifically, by treating the problem of vehicle detection as a two-class classification problem, we have investigated several different feature extraction methods such as principal component analysis, wavelets, and Gabor filters. To evaluate the extracted features, we have experimented with two popular classifiers, neural networks and support vector machines (SVMs). Based on our evaluation results, we have developed an on-board real-time monocular vehicle detection system that is capable of acquiring grey-scale images, using Ford's proprietary low-light camera, achieving an average detection rate of 10 Hz. Our vehicle detection algorithm consists of two main steps: a multiscale driven hypothesis generation step and an appearance-based hypothesis verification step. During the hypothesis generation step, image locations where vehicles might be present are extracted. This step uses multiscale techniques not only to speed up detection, but also to improve system robustness. The appearance-based hypothesis verification step verifies the hypotheses using Gabor features and SVMs. The system has been tested in Ford's concept vehicle under different traffic conditions (e.g., structured highway, complex urban streets, and varying weather conditions), illustrating good performance.

  6. Recombinant DNA means and method

    SciTech Connect

    Alford, B.L.; Mao, J.I.; Moir, D.T.; Taunton-Rigby, A.; Vovis, G.F.

    1987-05-19

    This patent describes a transformed living cell selected from the group consisting of fungi, yeast and bacteria, and containing genetic material derived from recombinant DNA material and coding for bovine rennin.

  7. Generating compact classifier systems using a simple artificial immune system.

    PubMed

    Leung, Kevin; Cheong, France; Cheong, Christopher

    2007-10-01

    Current artificial immune system (AIS) classifiers have two major problems: 1) their populations of B-cells can grow to huge proportions, and 2) optimizing one B-cell (part of the classifier) at a time does not necessarily guarantee that the B-cell pool (the whole classifier) will be optimized. In this paper, the design of a new AIS algorithm and classifier system called simple AIS is described. It is different from traditional AIS classifiers in that it takes only one B-cell, instead of a B-cell pool, to represent the classifier. This approach ensures global optimization of the whole system, and in addition, no population control mechanism is needed. The classifier was tested on seven benchmark data sets using different classification techniques and was found to be very competitive when compared to other classifiers.

  8. 69. VIEW FROM ABOVE OF PRIMARY MILL AND CLASSIFIER No. ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    69. VIEW FROM ABOVE OF PRIMARY MILL AND CLASSIFIER No. 2. PRIMARY CLASSIFIER No. 1 AT RIGHT EDGE OF VIEW. - Bald Mountain Gold Mill, Nevada Gulch at head of False Bottom Creek, Lead, Lawrence County, SD

  9. 41 CFR 105-62.102 - Authority to originally classify.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is...

  10. 41 CFR 105-62.102 - Authority to originally classify.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is...

  11. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 49 Transportation 9 2011-10-01 2011-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All...

  12. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 49 Transportation 9 2012-10-01 2012-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All...

  13. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 9 2010-10-01 2010-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All...

  14. 49 CFR 1280.6 - Storage of classified documents.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 49 Transportation 9 2014-10-01 2014-10-01 false Storage of classified documents. 1280.6 Section 1280.6 Transportation Other Regulations Relating to Transportation (Continued) SURFACE TRANSPORTATION... SECURITY INFORMATION AND CLASSIFIED MATERIAL § 1280.6 Storage of classified documents. All...

  15. 48 CFR 3.908-8 - Classified information.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Classified information. 3... Employees 3.908-8 Classified information. 41 U.S.C. 4712 does not provide any right to disclose classified information not otherwise provided by law....

  16. 41 CFR 109-43.307-51 - Classified personal property.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 41 Public Contracts and Property Management 3 2010-07-01 2010-07-01 false Classified personal... AND DISPOSAL 43-UTILIZATION OF PERSONAL PROPERTY 43.3-Utilization of Excess § 109-43.307-51 Classified personal property. Classified personal property which is excess to DOE needs shall be stripped of...

  17. 6 CFR 7.12 - Violations of classified information requirements.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 6 Domestic Security 1 2011-01-01 2011-01-01 false Violations of classified information requirements. 7.12 Section 7.12 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY CLASSIFIED NATIONAL SECURITY INFORMATION Administration § 7.12 Violations of classified...

  18. 6 CFR 7.12 - Violations of classified information requirements.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 6 Domestic Security 1 2012-01-01 2012-01-01 false Violations of classified information requirements. 7.12 Section 7.12 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY CLASSIFIED NATIONAL SECURITY INFORMATION Administration § 7.12 Violations of classified...

  19. 5 CFR 1312.35 - Information classified by another agency.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 5 Administrative Personnel 3 2012-01-01 2012-01-01 false Information classified by another agency... Declassification Review § 1312.35 Information classified by another agency. When a request is received for information that was classified by another agency, the Associate Director (or Assistant Director)...

  20. 5 CFR 1312.35 - Information classified by another agency.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 5 Administrative Personnel 3 2013-01-01 2013-01-01 false Information classified by another agency... Declassification Review § 1312.35 Information classified by another agency. When a request is received for information that was classified by another agency, the Associate Director (or Assistant Director)...

  1. 21 CFR 1402.4 - Information classified by another agency.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 21 Food and Drugs 9 2012-04-01 2012-04-01 false Information classified by another agency. 1402.4... § 1402.4 Information classified by another agency. When a request is received for information that was classified by another agency, the Director of the Office of Planning, Budget, and Administration of...

  2. 21 CFR 1402.4 - Information classified by another agency.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 21 Food and Drugs 9 2013-04-01 2013-04-01 false Information classified by another agency. 1402.4... § 1402.4 Information classified by another agency. When a request is received for information that was classified by another agency, the Director of the Office of Planning, Budget, and Administration of...

  3. 5 CFR 1312.35 - Information classified by another agency.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Information classified by another agency... Declassification Review § 1312.35 Information classified by another agency. When a request is received for information that was classified by another agency, the Associate Director (or Assistant Director)...

  4. 21 CFR 1402.4 - Information classified by another agency.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 21 Food and Drugs 9 2014-04-01 2014-04-01 false Information classified by another agency. 1402.4... § 1402.4 Information classified by another agency. When a request is received for information that was classified by another agency, the Director of the Office of Planning, Budget, and Administration of...

  5. 21 CFR 1402.4 - Information classified by another agency.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 9 2011-04-01 2011-04-01 false Information classified by another agency. 1402.4... § 1402.4 Information classified by another agency. When a request is received for information that was classified by another agency, the Director of the Office of Planning, Budget, and Administration of...

  6. 5 CFR 1312.35 - Information classified by another agency.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 5 Administrative Personnel 3 2014-01-01 2014-01-01 false Information classified by another agency... Declassification Review § 1312.35 Information classified by another agency. When a request is received for information that was classified by another agency, the Associate Director (or Assistant Director)...

  7. 21 CFR 1402.4 - Information classified by another agency.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false Information classified by another agency. 1402.4... § 1402.4 Information classified by another agency. When a request is received for information that was classified by another agency, the Director of the Office of Planning, Budget, and Administration of...

  8. 5 CFR 1312.35 - Information classified by another agency.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 5 Administrative Personnel 3 2011-01-01 2011-01-01 false Information classified by another agency... Declassification Review § 1312.35 Information classified by another agency. When a request is received for information that was classified by another agency, the Associate Director (or Assistant Director)...

  9. Method of generating features optimal to a dataset and classifier

    DOEpatents

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    2016-10-18

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  10. Recognition of pornographic web pages by classifying texts and images.

    PubMed

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages. PMID:17431300

  11. Mental Representation and Cognitive Consequences of Chinese Individual Classifiers

    ERIC Educational Resources Information Center

    Gao, Ming Y.; Malt, Barbara C.

    2009-01-01

    Classifier languages are spoken by a large portion of the world's population, but psychologists have only recently begun to investigate the psychological reality of classifier categories and their potential for influencing non-linguistic thought. The current work evaluates both the mental representation of classifiers and potential cognitive…

  12. 6 CFR 7.12 - Violations of classified information requirements.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Violations of classified information requirements. 7.12 Section 7.12 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY CLASSIFIED NATIONAL SECURITY INFORMATION Administration § 7.12 Violations of classified...

  13. Benchmarking a reduced multivariate polynomial pattern classifier.

    PubMed

    Toh, Kar-Ann; Tran, Quoc-Long; Srinivasan, Dipti

    2004-06-01

    A novel method using a reduced multivariate polynomial model has been developed for biometric decision fusion where simplicity and ease of use could be a concern. However, much to our surprise, the reduced model was found to have good classification accuracy for several commonly used data sets from the Web. In this paper, we extend the single output model to a multiple outputs model to handle multiple class problems. The method is particularly suitable for problems with small number of features and large number of examples. Basic component of this polynomial model boils down to construction of new pattern features which are sums of the original features and combination of these new and original features using power and product terms. A linear regularized least-squares predictor is then built using these constructed features. The number of constructed feature terms varies linearly with the order of the polynomial, instead of having a power law in the case of full multivariate polynomials. The method is simple as it amounts to only a few lines of Matlab code. We perform extensive experiments on this reduced model using 42 data sets. Our results compared remarkably well with best reported results of several commonly used algorithms from the literature. Both the classification accuracy and efficiency aspects are reported for this reduced model.

  14. The Local Control Index: A Proposed Model for Classifying Types of Local Control As a Function of Statutory Provisions.

    ERIC Educational Resources Information Center

    Luna, Lonnie Lynn

    The purpose of this study was to derive an operational definition of local control and to devise a model, the Local Control Index, for classifying degrees of local control by using the education codes of eight states--Arizona, California, Illinois, Mississippi, New Mexico, New York, Oklahoma, and Texas. The Local Control Index consists of four…

  15. DNA-based watermarks using the DNA-Crypt algorithm

    PubMed Central

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  16. Coding of Neuroinfectious Diseases.

    PubMed

    Barkley, Gregory L

    2015-12-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue. PMID:26633789

  17. Model Children's Code.

    ERIC Educational Resources Information Center

    New Mexico Univ., Albuquerque. American Indian Law Center.

    The Model Children's Code was developed to provide a legally correct model code that American Indian tribes can use to enact children's codes that fulfill their legal, cultural and economic needs. Code sections cover the court system, jurisdiction, juvenile offender procedures, minor-in-need-of-care, and termination. Almost every Code section is…

  18. Classifying aging as a disease in the context of ICD-11.

    PubMed

    Zhavoronkov, Alex; Bhullar, Bhupinder

    2015-01-01

    Aging is a complex continuous multifactorial process leading to loss of function and crystalizing into the many age-related diseases. Here, we explore the arguments for classifying aging as a disease in the context of the upcoming World Health Organization's 11th International Statistical Classification of Diseases and Related Health Problems (ICD-11), expected to be finalized in 2018. We hypothesize that classifying aging as a disease with a "non-garbage" set of codes will result in new approaches and business models for addressing aging as a treatable condition, which will lead to both economic and healthcare benefits for all stakeholders. Actionable classification of aging as a disease may lead to more efficient allocation of resources by enabling funding bodies and other stakeholders to use quality-adjusted life years (QALYs) and healthy-years equivalent (HYE) as metrics when evaluating both research and clinical programs. We propose forming a Task Force to interface the WHO in order to develop a multidisciplinary framework for classifying aging as a disease with multiple disease codes facilitating for therapeutic interventions and preventative strategies. PMID:26583032

  19. Classifying aging as a disease in the context of ICD-11

    PubMed Central

    Zhavoronkov, Alex; Bhullar, Bhupinder

    2015-01-01

    Aging is a complex continuous multifactorial process leading to loss of function and crystalizing into the many age-related diseases. Here, we explore the arguments for classifying aging as a disease in the context of the upcoming World Health Organization’s 11th International Statistical Classification of Diseases and Related Health Problems (ICD-11), expected to be finalized in 2018. We hypothesize that classifying aging as a disease with a “non-garbage” set of codes will result in new approaches and business models for addressing aging as a treatable condition, which will lead to both economic and healthcare benefits for all stakeholders. Actionable classification of aging as a disease may lead to more efficient allocation of resources by enabling funding bodies and other stakeholders to use quality-adjusted life years (QALYs) and healthy-years equivalent (HYE) as metrics when evaluating both research and clinical programs. We propose forming a Task Force to interface the WHO in order to develop a multidisciplinary framework for classifying aging as a disease with multiple disease codes facilitating for therapeutic interventions and preventative strategies. PMID:26583032

  20. To Code or Not To Code?

    ERIC Educational Resources Information Center

    Parkinson, Brian; Sandhu, Parveen; Lacorte, Manel; Gourlay, Lesley

    1998-01-01

    This article considers arguments for and against the use of coding systems in classroom-based language research and touches on some relevant considerations from ethnographic and conversational analysis approaches. The four authors each explain and elaborate on their practical decision to code or not to code events or utterances at a specific point…

  1. CLASSIFYING X-RAY BINARIES: A PROBABILISTIC APPROACH

    SciTech Connect

    Gopalan, Giri; Bornn, Luke; Vrtilek, Saeqa Dil

    2015-08-10

    In X-ray binary star systems consisting of a compact object that accretes material from an orbiting secondary star, there is no straightforward means to decide whether the compact object is a black hole or a neutron star. To assist in this process, we develop a Bayesian statistical model that makes use of the fact that X-ray binary systems appear to cluster based on their compact object type when viewed from a three-dimensional coordinate system derived from X-ray spectral data where the first coordinate is the ratio of counts in the mid- to low-energy band (color 1), the second coordinate is the ratio of counts in the high- to low-energy band (color 2), and the third coordinate is the sum of counts in all three bands. We use this model to estimate the probabilities of an X-ray binary system containing a black hole, non-pulsing neutron star, or pulsing neutron star. In particular, we utilize a latent variable model in which the latent variables follow a Gaussian process prior distribution, and hence we are able to induce the spatial correlation which we believe exists between systems of the same type. The utility of this approach is demonstrated by the accurate prediction of system types using Rossi X-ray Timing Explorer All Sky Monitor data, but it is not flawless. In particular, non-pulsing neutron systems containing “bursters” that are close to the boundary demarcating systems containing black holes tend to be classified as black hole systems. As a byproduct of our analyses, we provide the astronomer with the public R code which can be used to predict the compact object type of XRBs given training data.

  2. Structural diversity of supercoiled DNA

    NASA Astrophysics Data System (ADS)

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-10-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function.

  3. Structural diversity of supercoiled DNA

    PubMed Central

    Irobalieva, Rossitza N.; Fogg, Jonathan M.; Catanese, Daniel J.; Sutthibutpong, Thana; Chen, Muyuan; Barker, Anna K.; Ludtke, Steven J.; Harris, Sarah A.; Schmid, Michael F.; Chiu, Wah; Zechiedrich, Lynn

    2015-01-01

    By regulating access to the genetic code, DNA supercoiling strongly affects DNA metabolism. Despite its importance, however, much about supercoiled DNA (positively supercoiled DNA, in particular) remains unknown. Here we use electron cryo-tomography together with biochemical analyses to investigate structures of individual purified DNA minicircle topoisomers with defined degrees of supercoiling. Our results reveal that each topoisomer, negative or positive, adopts a unique and surprisingly wide distribution of three-dimensional conformations. Moreover, we uncover striking differences in how the topoisomers handle torsional stress. As negative supercoiling increases, bases are increasingly exposed. Beyond a sharp supercoiling threshold, we also detect exposed bases in positively supercoiled DNA. Molecular dynamics simulations independently confirm the conformational heterogeneity and provide atomistic insight into the flexibility of supercoiled DNA. Our integrated approach reveals the three-dimensional structures of DNA that are essential for its function. PMID:26455586

  4. Towards modeling DNA sequences as automata

    NASA Astrophysics Data System (ADS)

    Burks, Christian; Farmer, Doyne

    1984-01-01

    We seek to describe a starting point for modeling the evolution and role of DNA sequences within the framework of cellular automata by discussing the current understanding of genetic information storage in DNA sequences. This includes alternately viewing the role of DNA in living organisms as a simple scheme and as a complex scheme; a brief review of strategies for identifying and classifying patterns in DNA sequences; and finally, notes towards establishing DNA-like automata models, including a discussion of the extent of experimentally determined DNA sequence data present in the database at Los Alamos.

  5. Bare Code Reader

    NASA Astrophysics Data System (ADS)

    Clair, Jean J.

    1980-05-01

    The Bare code system will be used, in every market and supermarket. The code, which is normalised in US and Europe (code EAN) gives informations on price, storage, nature and allows in real time the gestion of theshop.

  6. Genetic algorithms and classifier systems: Foundations and future directions

    SciTech Connect

    Holland, J.H.

    1987-01-01

    Theoretical questions about classifier systems, with rare exceptions, apply equally to other adaptive nonlinear networks (ANNs) such as the connectionist models of cognitive psychology, the immune system, economic systems, ecologies, and genetic systems. This paper discusses pervasive properties of ANNs and the kinds of mathematics relevant to questions about these properties. It discusses relevant functional extensions of the basic classifier system and extensions of the extant mathematical theory. An appendix briefly reviews some of the key theorems about classifier systems. 6 refs.

  7. DNA methylation in plants.

    PubMed

    Vanyushin, B F

    2006-01-01

    DNA in plants is highly methylated, containing 5-methylcytosine (m5C) and N6-methyladenine (m6A); m5C is located mainly in symmetrical CG and CNG sequences but it may occur also in other non-symmetrical contexts. m6A but not m5C was found in plant mitochondrial DNA. DNA methylation in plants is species-, tissue-, organelle- and age-specific. It is controlled by phytohormones and changes on seed germination, flowering and under the influence of various pathogens (viral, bacterial, fungal). DNA methylation controls plant growth and development, with particular involvement in regulation of gene expression and DNA replication. DNA replication is accompanied by the appearance of under-methylated, newly formed DNA strands including Okazaki fragments; asymmetry of strand DNA methylation disappears until the end of the cell cycle. A model for regulation of DNA replication by methylation is suggested. Cytosine DNA methylation in plants is more rich and diverse compared with animals. It is carried out by the families of specific enzymes that belong to at least three classes of DNA methyltransferases. Open reading frames (ORF) for adenine DNA methyltransferases are found in plant and animal genomes, and a first eukaryotic (plant) adenine DNA methyltransferase (wadmtase) is described; the enzyme seems to be involved in regulation of the mitochondria replication. Like in animals, DNA methylation in plants is closely associated with histone modifications and it affects binding of specific proteins to DNA and formation of respective transcription complexes in chromatin. The same gene (DRM2) in Arabidopsis thaliana is methylated both at cytosine and adenine residues; thus, at least two different, and probably interdependent, systems of DNA modification are present in plants. Plants seem to have a restriction-modification (R-M) system. RNA-directed DNA methylation has been observed in plants; it involves de novo methylation of almost all cytosine residues in a region of siRNA-DNA

  8. Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

    PubMed

    Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

    2015-05-01

    Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies.

  9. Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

    PubMed

    Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

    2015-05-01

    Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. PMID:25869968

  10. 33 CFR 149.405 - How are fire extinguishers classified?

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... SECURITY (CONTINUED) DEEPWATER PORTS DEEPWATER PORTS: DESIGN, CONSTRUCTION, AND EQUIPMENT Firefighting and Fire Protection Equipment Firefighting Requirements § 149.405 How are fire extinguishers classified?...

  11. 33 CFR 149.405 - How are fire extinguishers classified?

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... SECURITY (CONTINUED) DEEPWATER PORTS DEEPWATER PORTS: DESIGN, CONSTRUCTION, AND EQUIPMENT Firefighting and Fire Protection Equipment Firefighting Requirements § 149.405 How are fire extinguishers classified?...

  12. 33 CFR 149.405 - How are fire extinguishers classified?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... SECURITY (CONTINUED) DEEPWATER PORTS DEEPWATER PORTS: DESIGN, CONSTRUCTION, AND EQUIPMENT Firefighting and Fire Protection Equipment Firefighting Requirements § 149.405 How are fire extinguishers classified?...

  13. 33 CFR 149.405 - How are fire extinguishers classified?

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... SECURITY (CONTINUED) DEEPWATER PORTS DEEPWATER PORTS: DESIGN, CONSTRUCTION, AND EQUIPMENT Firefighting and Fire Protection Equipment Firefighting Requirements § 149.405 How are fire extinguishers classified?...

  14. Facial expression recognition with facial parts based sparse representation classifier

    NASA Astrophysics Data System (ADS)

    Zhi, Ruicong; Ruan, Qiuqi

    2009-10-01

    Facial expressions play important role in human communication. The understanding of facial expression is a basic requirement in the development of next generation human computer interaction systems. Researches show that the intrinsic facial features always hide in low dimensional facial subspaces. This paper presents facial parts based facial expression recognition system with sparse representation classifier. Sparse representation classifier exploits sparse representation to select face features and classify facial expressions. The sparse solution is obtained by solving l1 -norm minimization problem with constraint of linear combination equation. Experimental results show that sparse representation is efficient for facial expression recognition and sparse representation classifier obtain much higher recognition accuracies than other compared methods.

  15. A space-based radio frequency transient event classifier

    SciTech Connect

    Moore, K.R.; Blain, P.C.; Caffrey, M.P.; Franz, R.C.; Henneke, K.M.; Jones, R.G.

    1996-12-31

    The FORTE (Fast On-Orbit Recording of Transient Events) satellite will record RF transients in space. These transients will be classified onboard the spacecraft with an Event Classifier--specialized hardware that performs signal preprocessing and neural network classification. The authors describe the Event Classifier, future directions, and implications for telecommunications satellites. Telecommunication satellites are susceptible to damage from environmental factors such as deep dielectric charging and surface discharges. The event classifier technology the authors are developing is capable of sensing the surface discharges and could be useful for mitigating their effects. In addition, the techniques they are using for processing weak signals in noisy environments are relevant to telecommunications.

  16. Using Classifiers to Identify Binge Drinkers Based on Drinking Motives.

    PubMed

    Crutzen, Rik; Giabbanelli, Philippe

    2013-08-21

    A representative sample of 2,844 Dutch adult drinkers completed a questionnaire on drinking motives and drinking behavior in January 2011. Results were classified using regressions, decision trees, and support vector machines (SVMs). Using SVMs, the mean absolute error was minimal, whereas performance on identifying binge drinkers was high. Moreover, when comparing the structure of classifiers, there were differences in which drinking motives contribute to the performance of classifiers. Thus, classifiers are worthwhile to be used in research regarding (addictive) behaviors, because they contribute to explaining behavior and they can give different insights from more traditional data analytical approaches. PMID:23964957

  17. One-hot vector hybrid associative classifier for medical data classification.

    PubMed

    Uriarte-Arcia, Abril Valeria; López-Yáñez, Itzamá; Yáñez-Márquez, Cornelio

    2014-01-01

    Pattern recognition and classification are two of the key topics in computer science. In this paper a novel method for the task of pattern classification is presented. The proposed method combines a hybrid associative classifier (Clasificador Híbrido Asociativo con Traslación, CHAT, in Spanish), a coding technique for output patterns called one-hot vector and majority voting during the classification step. The method is termed as CHAT One-Hot Majority (CHAT-OHM). The performance of the method is validated by comparing the accuracy of CHAT-OHM with other well-known classification algorithms. During the experimental phase, the classifier was applied to four datasets related to the medical field. The results also show that the proposed method outperforms the original CHAT classification accuracy.

  18. One-Hot Vector Hybrid Associative Classifier for Medical Data Classification

    PubMed Central

    Uriarte-Arcia, Abril Valeria; López-Yáñez, Itzamá; Yáñez-Márquez, Cornelio

    2014-01-01

    Pattern recognition and classification are two of the key topics in computer science. In this paper a novel method for the task of pattern classification is presented. The proposed method combines a hybrid associative classifier (Clasificador Híbrido Asociativo con Traslación, CHAT, in Spanish), a coding technique for output patterns called one-hot vector and majority voting during the classification step. The method is termed as CHAT One-Hot Majority (CHAT-OHM). The performance of the method is validated by comparing the accuracy of CHAT-OHM with other well-known classification algorithms. During the experimental phase, the classifier was applied to four datasets related to the medical field. The results also show that the proposed method outperforms the original CHAT classification accuracy. PMID:24752287

  19. Categories of Code-Switching in Hispanic Communities: Untangling the Terminology. Sociolinguistic Working Paper Number 76.

    ERIC Educational Resources Information Center

    Baker, Opal Ruth

    Research on Spanish/English code switching is reviewed and the definitions and categories set up by the investigators are examined. Their methods of locating, limiting, and classifying true code switches, and the terms used and results obtained, are compared. It is found that in these studies, conversational (intra-discourse) code switching is…

  20. Accumulate repeat accumulate codes

    NASA Technical Reports Server (NTRS)

    Abbasfar, Aliazam; Divsalar, Dariush; Yao, Kung

    2004-01-01

    In this paper we propose an innovative channel coding scheme called 'Accumulate Repeat Accumulate codes' (ARA). This class of codes can be viewed as serial turbo-like codes, or as a subclass of Low Density Parity Check (LDPC) codes, thus belief propagation can be used for iterative decoding of ARA codes on a graph. The structure of encoder for this class can be viewed as precoded Repeat Accumulate (RA) code or as precoded Irregular Repeat Accumulate (IRA) code, where simply an accumulator is chosen as a precoder. Thus ARA codes have simple, and very fast encoder structure when they representing LDPC codes. Based on density evolution for LDPC codes through some examples for ARA codes, we show that for maximum variable node degree 5 a minimum bit SNR as low as 0.08 dB from channel capacity for rate 1/2 can be achieved as the block size goes to infinity. Thus based on fixed low maximum variable node degree, its threshold outperforms not only the RA and IRA codes but also the best known LDPC codes with the dame maximum node degree. Furthermore by puncturing the accumulators any desired high rate codes close to code rate 1 can be obtained with thresholds that stay close to the channel capacity thresholds uniformly. Iterative decoding simulation results are provided. The ARA codes also have projected graph or protograph representation that allows for high speed decoder implementation.

  1. Multi-input distributed classifiers for synthetic genetic circuits.

    PubMed

    Kanakov, Oleg; Kotelnikov, Roman; Alsaedi, Ahmed; Tsimring, Lev; Huerta, Ramón; Zaikin, Alexey; Ivanchenko, Mikhail

    2015-01-01

    For practical construction of complex synthetic genetic networks able to perform elaborate functions it is important to have a pool of relatively simple modules with different functionality which can be compounded together. To complement engineering of very different existing synthetic genetic devices such as switches, oscillators or logical gates, we propose and develop here a design of synthetic multi-input classifier based on a recently introduced distributed classifier concept. A heterogeneous population of cells acts as a single classifier, whose output is obtained by summarizing the outputs of individual cells. The learning ability is achieved by pruning the population, instead of tuning parameters of an individual cell. The present paper is focused on evaluating two possible schemes of multi-input gene classifier circuits. We demonstrate their suitability for implementing a multi-input distributed classifier capable of separating data which are inseparable for single-input classifiers, and characterize performance of the classifiers by analytical and numerical results. The simpler scheme implements a linear classifier in a single cell and is targeted at separable classification problems with simple class borders. A hard learning strategy is used to train a distributed classifier by removing from the population any cell answering incorrectly to at least one training example. The other scheme implements a circuit with a bell-shaped response in a single cell to allow potentially arbitrary shape of the classification border in the input space of a distributed classifier. Inseparable classification problems are addressed using soft learning strategy, characterized by probabilistic decision to keep or discard a cell at each training iteration. We expect that our classifier design contributes to the development of robust and predictable synthetic biosensors, which have the potential to affect applications in a lot of fields, including that of medicine and industry.

  2. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  3. Discussion on LDPC Codes and Uplink Coding

    NASA Technical Reports Server (NTRS)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the progress that the workgroup on Low-Density Parity-Check (LDPC) for space link coding. The workgroup is tasked with developing and recommending new error correcting codes for near-Earth, Lunar, and deep space applications. Included in the presentation is a summary of the technical progress of the workgroup. Charts that show the LDPC decoder sensitivity to symbol scaling errors are reviewed, as well as a chart showing the performance of several frame synchronizer algorithms compared to that of some good codes and LDPC decoder tests at ESTL. Also reviewed is a study on Coding, Modulation, and Link Protocol (CMLP), and the recommended codes. A design for the Pseudo-Randomizer with LDPC Decoder and CRC is also reviewed. A chart that summarizes the three proposed coding systems is also presented.

  4. 16 CFR 1610.4 - Requirements for classifying textiles.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 16 Commercial Practices 2 2014-01-01 2014-01-01 false Requirements for classifying textiles. 1610... REGULATIONS STANDARD FOR THE FLAMMABILITY OF CLOTHING TEXTILES The Standard § 1610.4 Requirements for classifying textiles. (a) Class 1, Normal Flammability. Class 1 textiles exhibit normal flammability and...

  5. 16 CFR 1610.4 - Requirements for classifying textiles.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 16 Commercial Practices 2 2012-01-01 2012-01-01 false Requirements for classifying textiles. 1610... REGULATIONS STANDARD FOR THE FLAMMABILITY OF CLOTHING TEXTILES The Standard § 1610.4 Requirements for classifying textiles. (a) Class 1, Normal Flammability. Class 1 textiles exhibit normal flammability and...

  6. 16 CFR 1610.4 - Requirements for classifying textiles.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 16 Commercial Practices 2 2013-01-01 2013-01-01 false Requirements for classifying textiles. 1610... REGULATIONS STANDARD FOR THE FLAMMABILITY OF CLOTHING TEXTILES The Standard § 1610.4 Requirements for classifying textiles. (a) Class 1, Normal Flammability. Class 1 textiles exhibit normal flammability and...

  7. 16 CFR 1610.4 - Requirements for classifying textiles.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 16 Commercial Practices 2 2011-01-01 2011-01-01 false Requirements for classifying textiles. 1610... REGULATIONS STANDARD FOR THE FLAMMABILITY OF CLOTHING TEXTILES The Standard § 1610.4 Requirements for classifying textiles. (a) Class 1, Normal Flammability. Class 1 textiles exhibit normal flammability and...

  8. 16 CFR 1610.4 - Requirements for classifying textiles.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 16 Commercial Practices 2 2010-01-01 2010-01-01 false Requirements for classifying textiles. 1610... REGULATIONS STANDARD FOR THE FLAMMABILITY OF CLOTHING TEXTILES The Standard § 1610.4 Requirements for classifying textiles. (a) Class 1, Normal Flammability. Class 1 textiles exhibit normal flammability and...

  9. 14 CFR 1203.400 - Specific classifying guidance.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 5 2011-01-01 2010-01-01 true Specific classifying guidance. 1203.400 Section 1203.400 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.400 Specific classifying guidance. Technological...

  10. 14 CFR 1203.402 - Classifying material other than documentation.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 5 2011-01-01 2010-01-01 true Classifying material other than documentation. 1203.402 Section 1203.402 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.402 Classifying material other...

  11. 14 CFR 1203.400 - Specific classifying guidance.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 5 2013-01-01 2013-01-01 false Specific classifying guidance. 1203.400 Section 1203.400 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.400 Specific classifying guidance. Technological...

  12. 14 CFR 1203.402 - Classifying material other than documentation.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 5 2012-01-01 2012-01-01 false Classifying material other than documentation. 1203.402 Section 1203.402 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.402 Classifying material other...

  13. 14 CFR 1203.402 - Classifying material other than documentation.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 5 2013-01-01 2013-01-01 false Classifying material other than documentation. 1203.402 Section 1203.402 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.402 Classifying material other...

  14. Verb-raising and Numeral Classifiers in Japanese: Incompatible Bedfellows.

    ERIC Educational Resources Information Center

    Fukushima, Kazuhiko

    2003-01-01

    Examines verb raising in Japanese and looks at Koizumi's (2000) evidence for verb-raising based on data involving, among other things, numeral classifiers. Demonstrates that Koizumi's evidence based on numeral classifiers does not support his claim that verb-raising occurs in Japanese. (Author/VWL)

  15. Using ensemble classifier to identify membrane protein types.

    PubMed

    Shen, H-B; Chou, K-C

    2007-01-01

    Predicting membrane protein type is both an important and challenging topic in current molecular and cellular biology. This is because knowledge of membrane protein type often provides useful clues for determining, or sheds light upon, the function of an uncharacterized membrane protein. With the explosion of newly-found protein sequences in the post-genomic era, it is in a great demand to develop a computational method for fast and reliably identifying the types of membrane proteins according to their primary sequences. In this paper, a novel classifier, the so-called "ensemble classifier", was introduced. It is formed by fusing a set of nearest neighbor (NN) classifiers, each of which is defined in a different pseudo amino acid composition space. The type for a query protein is determined by the outcome of voting among these constituent individual classifiers. It was demonstrated through the self-consistency test, jackknife test, and independent dataset test that the ensemble classifier outperformed other existing classifiers widely used in biological literatures. It is anticipated that the idea of ensemble classifier can also be used to improve the prediction quality in classifying other attributes of proteins according to their sequences.

  16. "Scissors, Paper, Stone": Perceptual Foundations of Noun Classifier Systems.

    ERIC Educational Resources Information Center

    Erbaugh, Mary S.

    While all languages use shape to classify unfamiliar objects, some languages as diverse as Mandarin, Thai, Japanese, Mohawk, and American Sign Language lexicalize these and other types of description as noun classifiers. Classification does not develop from a fixed set of features in the object, but is discourse-sensitive and invoked when it would…

  17. 40 CFR 152.175 - Pesticides classified for restricted use.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 24 2011-07-01 2011-07-01 false Pesticides classified for restricted...) PESTICIDE PROGRAMS PESTICIDE REGISTRATION AND CLASSIFICATION PROCEDURES Classification of Pesticides § 152.175 Pesticides classified for restricted use. The following uses of pesticide products containing...

  18. 10 CFR 1045.34 - Designation of restricted data classifiers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 4 2011-01-01 2011-01-01 false Designation of restricted data classifiers. 1045.34 Section 1045.34 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data § 1045.34 Designation of restricted data classifiers....

  19. 32 CFR 2400.32 - Transmittal of classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... TECHNOLOGY POLICY REGULATIONS TO IMPLEMENT E.O. 12356; OFFICE OF SCIENCE AND TECHNOLOGY POLICY INFORMATION... classified information outside of the Office of Science and Technology Policy shall be in accordance with... 32 National Defense 6 2010-07-01 2010-07-01 false Transmittal of classified information....

  20. 6 CFR 7.23 - Emergency release of classified information.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 6 Domestic Security 1 2012-01-01 2012-01-01 false Emergency release of classified information. 7.23 Section 7.23 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY CLASSIFIED...) The Secretary of Homeland Security has delegated to certain DHS employees the authority to...

  1. 6 CFR 7.23 - Emergency release of classified information.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 6 Domestic Security 1 2011-01-01 2011-01-01 false Emergency release of classified information. 7.23 Section 7.23 Domestic Security DEPARTMENT OF HOMELAND SECURITY, OFFICE OF THE SECRETARY CLASSIFIED...) The Secretary of Homeland Security has delegated to certain DHS employees the authority to...

  2. Self-recalibrating classifiers for intracortical brain-computer interfaces

    PubMed Central

    Bishop, William; Chestek, Cynthia C; Gilja, Vikash; Nuyujukian, Paul; Foster, Justin D; Ryu, Stephen I; Shenoy, Krishna V; Yu, Byron M

    2014-01-01

    Objective Intracortical brain-computer interface (BCI) decoders are typically retrained daily to maintain stable performance. Self-recalibrating decoders aim to remove the burden this may present in the clinic by training themselves autonomously during normal use but have only been developed for continuous control. Here we address the problem for discrete decoding (classifiers). Approach We recorded threshold crossings from 96-electrode arrays implanted in the motor cortex of two rhesus macaques performing center-out reaches in 7 directions over 41 and 36 separate days spanning 48 and 58 days in total for offline analysis. Main results We show that for the purposes of developing a self-recalibrating classifier, tuning parameters can be considered as fixed within days and that parameters on the same electrode move up and down together between days. Further, drift is constrained across time, which is reflected in the performance of a standard classifier which does not progressively worsen if it is not retrained daily, though overall performance is reduced by more than 10% compared to a daily retrained classifier. Two novel self-recalibrating classifiers produce a ~15% increase in classification accuracy over that achieved by the non-retrained classifier to nearly recover the performance of the daily retrained classifier. Significance We believe that the development of classifiers that require no daily retraining will accelerate the clinical translation of BCI systems. Future work should test these results in a closed loop setting. PMID:24503597

  3. Self-recalibrating classifiers for intracortical brain-computer interfaces

    NASA Astrophysics Data System (ADS)

    Bishop, William; Chestek, Cynthia C.; Gilja, Vikash; Nuyujukian, Paul; Foster, Justin D.; Ryu, Stephen I.; Shenoy, Krishna V.; Yu, Byron M.

    2014-04-01

    Objective. Intracortical brain-computer interface (BCI) decoders are typically retrained daily to maintain stable performance. Self-recalibrating decoders aim to remove the burden this may present in the clinic by training themselves autonomously during normal use but have only been developed for continuous control. Here we address the problem for discrete decoding (classifiers). Approach. We recorded threshold crossings from 96-electrode arrays implanted in the motor cortex of two rhesus macaques performing center-out reaches in 7 directions over 41 and 36 separate days spanning 48 and 58 days in total for offline analysis. Main results. We show that for the purposes of developing a self-recalibrating classifier, tuning parameters can be considered as fixed within days and that parameters on the same electrode move up and down together between days. Further, drift is constrained across time, which is reflected in the performance of a standard classifier which does not progressively worsen if it is not retrained daily, though overall performance is reduced by more than 10% compared to a daily retrained classifier. Two novel self-recalibrating classifiers produce a \\mathord {\\sim }15% increase in classification accuracy over that achieved by the non-retrained classifier to nearly recover the performance of the daily retrained classifier. Significance. We believe that the development of classifiers that require no daily retraining will accelerate the clinical translation of BCI systems. Future work should test these results in a closed-loop setting.

  4. 25 CFR 304.3 - Classifying and marking of silver.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 25 Indians 2 2014-04-01 2014-04-01 false Classifying and marking of silver. 304.3 Section 304.3 Indians INDIAN ARTS AND CRAFTS BOARD, DEPARTMENT OF THE INTERIOR NAVAJO, PUEBLO, AND HOPI SILVER, USE OF GOVERNMENT MARK § 304.3 Classifying and marking of silver. For the present the Indian Arts and Crafts...

  5. 25 CFR 304.3 - Classifying and marking of silver.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 25 Indians 2 2010-04-01 2010-04-01 false Classifying and marking of silver. 304.3 Section 304.3 Indians INDIAN ARTS AND CRAFTS BOARD, DEPARTMENT OF THE INTERIOR NAVAJO, PUEBLO, AND HOPI SILVER, USE OF GOVERNMENT MARK § 304.3 Classifying and marking of silver. For the present the Indian Arts and Crafts...

  6. 25 CFR 304.3 - Classifying and marking of silver.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 25 Indians 2 2011-04-01 2011-04-01 false Classifying and marking of silver. 304.3 Section 304.3 Indians INDIAN ARTS AND CRAFTS BOARD, DEPARTMENT OF THE INTERIOR NAVAJO, PUEBLO, AND HOPI SILVER, USE OF GOVERNMENT MARK § 304.3 Classifying and marking of silver. For the present the Indian Arts and Crafts...

  7. 25 CFR 304.3 - Classifying and marking of silver.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 25 Indians 2 2012-04-01 2012-04-01 false Classifying and marking of silver. 304.3 Section 304.3 Indians INDIAN ARTS AND CRAFTS BOARD, DEPARTMENT OF THE INTERIOR NAVAJO, PUEBLO, AND HOPI SILVER, USE OF GOVERNMENT MARK § 304.3 Classifying and marking of silver. For the present the Indian Arts and Crafts...

  8. 25 CFR 304.3 - Classifying and marking of silver.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 25 Indians 2 2013-04-01 2013-04-01 false Classifying and marking of silver. 304.3 Section 304.3 Indians INDIAN ARTS AND CRAFTS BOARD, DEPARTMENT OF THE INTERIOR NAVAJO, PUEBLO, AND HOPI SILVER, USE OF GOVERNMENT MARK § 304.3 Classifying and marking of silver. For the present the Indian Arts and Crafts...

  9. Increasing Children's ASL Classifier Production: A Multicomponent Intervention

    ERIC Educational Resources Information Center

    Beal-Alvarez, Jennifer S.; Easterbrooks, Susan R.

    2013-01-01

    The Authors examined classifier production during narrative retells by 10 deaf and hard of hearing students in grades 2-4 at a day school for the deaf following a 6-week intervention of repeated viewings of stories in American Sign Language (ASL) paired with scripted teacher mediation. Classifier production, documented through a…

  10. 45 CFR 601.8 - Access to classified materials.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 45 Public Welfare 3 2013-10-01 2013-10-01 false Access to classified materials. 601.8 Section 601.8 Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.8 Access to classified materials....

  11. 14 CFR 1203.400 - Specific classifying guidance.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Specific classifying guidance. 1203.400 Section 1203.400 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.400 Specific classifying guidance. Technological...

  12. DETAIL VIEW OF THREE CONCENTRATION TABLES, LOADING RAMP, AND CLASSIFIER, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    DETAIL VIEW OF THREE CONCENTRATION TABLES, LOADING RAMP, AND CLASSIFIER, LOOKING EST. THE RAKE THAT WAS ORIGINALLY INSIDE THE CLASSIFIER IS AT CENTER RIGHT ON TOP OF THE LOADING RAMP. - Gold Hill Mill, Warm Spring Canyon Road, Death Valley Junction, Inyo County, CA

  13. Hunt for Federal Funds Gives Classified Research a Lift

    ERIC Educational Resources Information Center

    Basken, Paul

    2012-01-01

    For some colleges and professors, classified research promises prestige and money. Powerhouses like the Massachusetts Institute of Technology and the Johns Hopkins University have for decades run large classified laboratories. But most other universities either do not allow such research or conduct it quietly, and in small doses. The…

  14. 43 CFR 2.41 - Declassification of classified documents.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 43 Public Lands: Interior 1 2010-10-01 2010-10-01 false Declassification of classified documents. 2.41 Section 2.41 Public Lands: Interior Office of the Secretary of the Interior RECORDS AND... classified documents. (a) Request for classification review. (1) Requests for a classification review of...

  15. 45 CFR 601.8 - Access to classified materials.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 45 Public Welfare 3 2010-10-01 2010-10-01 false Access to classified materials. 601.8 Section 601.8 Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.8 Access to classified materials....

  16. Fisher classifier and its probability of error estimation

    NASA Technical Reports Server (NTRS)

    Chittineni, C. B.

    1979-01-01

    Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.

  17. 18 CFR 367.18 - Criteria for classifying leases.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 18 Conservation of Power and Water Resources 1 2013-04-01 2013-04-01 false Criteria for classifying leases. 367.18 Section 367.18 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... ACT General Instructions § 367.18 Criteria for classifying leases. (a) If, at its inception, a...

  18. 18 CFR 367.18 - Criteria for classifying leases.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 18 Conservation of Power and Water Resources 1 2014-04-01 2014-04-01 false Criteria for classifying leases. 367.18 Section 367.18 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... ACT General Instructions § 367.18 Criteria for classifying leases. (a) If, at its inception, a...

  19. 18 CFR 367.18 - Criteria for classifying leases.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Criteria for classifying leases. 367.18 Section 367.18 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... ACT General Instructions § 367.18 Criteria for classifying leases. (a) If, at its inception, a...

  20. 18 CFR 367.18 - Criteria for classifying leases.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 18 Conservation of Power and Water Resources 1 2012-04-01 2012-04-01 false Criteria for classifying leases. 367.18 Section 367.18 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... ACT General Instructions § 367.18 Criteria for classifying leases. (a) If, at its inception, a...

  1. 18 CFR 367.18 - Criteria for classifying leases.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 18 Conservation of Power and Water Resources 1 2011-04-01 2011-04-01 false Criteria for classifying leases. 367.18 Section 367.18 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY... ACT General Instructions § 367.18 Criteria for classifying leases. (a) If, at its inception, a...

  2. 32 CFR 2400.32 - Transmittal of classified information.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... TECHNOLOGY POLICY REGULATIONS TO IMPLEMENT E.O. 12356; OFFICE OF SCIENCE AND TECHNOLOGY POLICY INFORMATION... classified information outside of the Office of Science and Technology Policy shall be in accordance with... 32 National Defense 6 2011-07-01 2011-07-01 false Transmittal of classified information....

  3. 32 CFR 2400.30 - Reproduction of classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be...

  4. 32 CFR 2400.30 - Reproduction of classified information.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be...

  5. 32 CFR 2400.30 - Reproduction of classified information.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be...

  6. 32 CFR 2400.30 - Reproduction of classified information.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be...

  7. 32 CFR 2400.30 - Reproduction of classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be...

  8. 45 CFR 601.8 - Access to classified materials.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 45 Public Welfare 3 2014-10-01 2014-10-01 false Access to classified materials. 601.8 Section 601.8 Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.8 Access to classified materials....

  9. 45 CFR 601.8 - Access to classified materials.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 45 Public Welfare 3 2011-10-01 2011-10-01 false Access to classified materials. 601.8 Section 601.8 Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.8 Access to classified materials....

  10. 45 CFR 601.8 - Access to classified materials.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 45 Public Welfare 3 2012-10-01 2012-10-01 false Access to classified materials. 601.8 Section 601.8 Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.8 Access to classified materials....

  11. 14 CFR 1203.402 - Classifying material other than documentation.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Classifying material other than documentation. 1203.402 Section 1203.402 Aeronautics and Space NATIONAL AERONAUTICS AND SPACE ADMINISTRATION INFORMATION SECURITY PROGRAM Guides for Original Classification § 1203.402 Classifying material other...

  12. Manually operated coded switch

    DOEpatents

    Barnette, Jon H.

    1978-01-01

    The disclosure relates to a manually operated recodable coded switch in which a code may be inserted, tried and used to actuate a lever controlling an external device. After attempting a code, the switch's code wheels must be returned to their zero positions before another try is made.

  13. The EB factory project. I. A fast, neural-net-based, general purpose light curve classifier optimized for eclipsing binaries

    SciTech Connect

    Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.

    2014-08-01

    We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together with a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (∼60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.

  14. The EB Factory Project. I. A Fast, Neural-net-based, General Purpose Light Curve Classifier Optimized for Eclipsing Binaries

    NASA Astrophysics Data System (ADS)

    Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.

    2014-08-01

    We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together with a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (~60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.

  15. Parafermion stabilizer codes

    NASA Astrophysics Data System (ADS)

    Güngördü, Utkan; Nepal, Rabindra; Kovalev, Alexey A.

    2014-10-01

    We define and study parafermion stabilizer codes, which can be viewed as generalizations of Kitaev's one-dimensional (1D) model of unpaired Majorana fermions. Parafermion stabilizer codes can protect against low-weight errors acting on a small subset of parafermion modes in analogy to qudit stabilizer codes. Examples of several smallest parafermion stabilizer codes are given. A locality-preserving embedding of qudit operators into parafermion operators is established that allows one to map known qudit stabilizer codes to parafermion codes. We also present a local 2D parafermion construction that combines topological protection of Kitaev's toric code with additional protection relying on parity conservation.

  16. Combining Contextual and Lexical Features to Classify UMLS Concepts

    PubMed Central

    Fan, Jung-Wei; Friedman, Carol

    2007-01-01

    Semantic classification is important for biomedical terminologies and the many applications that depend on them. Previously we developed two classifiers for 8 broad clinically relevant classes to reclassify and validate UMLS concepts. We found them to be complementary, and then combined them using a manual approach. In this paper, we extended the classifiers by adding an “other” class to categorize concepts not belonging to any of the 8 classes. In addition, we focused on automating the method for combining the two classifiers by training a meta-classifier that performs dynamic combination to exploit the strength of each classifier. The automated method performed as well as manual combination, achieving classification accuracy of about 0.81. PMID:18693832

  17. A study on intrusion detection model based on hybrid classifier

    NASA Astrophysics Data System (ADS)

    Liu, Kewen; Yang, Qingbo

    2013-03-01

    In order to improve the accuracy of classification problem in intrusion detection, a hybrid classifier which was composed by KPCA, BPNN and QGA, has been proposed in this paper. In the hybrid classifier, KPCA was used to reduce dimensions, and then QGA was used to search the best parameters for BPNN. BPNN which has been got the best weights matrix and thresholds by QGA, was used to train classification model. The main core factors of original dataset can be preserved by KPCA, and greatly reduced the computations. The weakness of BPNN, which was usually easy to get stuck in local minimum, can be solved by QGA. Finally, the effectiveness of hybrid classifier was proved by experiments. Compared with traditional methods, the hybrid classifier has better performance in reducing the classify errors.

  18. cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

    PubMed Central

    Kumari, Pooja; Sampath, Karuna

    2015-01-01

    For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036

  19. EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify H3R2me1 as a distinctive mark of coding versus non-coding genes.

    PubMed

    Santoni, Federico Andrea

    2013-02-01

    Existence of some extra-genetic (epigenetic) codes has been postulated since the discovery of the primary genetic code. Evident effects of histone post-translational modifications or DNA methylation over the efficiency and the regulation of DNA processes are supporting this postulation. EMdeCODE is an original algorithm that approximate the genomic distribution of given DNA features (e.g. promoter, enhancer, viral integration) by identifying relevant ChIPSeq profiles of post-translational histone marks or DNA binding proteins and combining them in a supermark. EMdeCODE kernel is essentially a two-step procedure: (i) an expectation-maximization process calculates the mixture of epigenetic factors that maximize the Sensitivity (recall) of the association with the feature under study; (ii) the approximated density is then recursively trimmed with respect to a control dataset to increase the precision by reducing the number of false positives. EMdeCODE densities improve significantly the prediction of enhancer loci and retroviral integration sites with respect to previous methods. Importantly, it can also be used to extract distinctive factors between two arbitrary conditions. Indeed EMdeCODE identifies unexpected epigenetic profiles specific for coding versus non-coding RNA, pointing towards a new role for H3R2me1 in coding regions.

  20. ARA type protograph codes

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush (Inventor); Abbasfar, Aliazam (Inventor); Jones, Christopher R. (Inventor); Dolinar, Samuel J. (Inventor); Thorpe, Jeremy C. (Inventor); Andrews, Kenneth S. (Inventor); Yao, Kung (Inventor)

    2008-01-01

    An apparatus and method for encoding low-density parity check codes. Together with a repeater, an interleaver and an accumulator, the apparatus comprises a precoder, thus forming accumulate-repeat-accumulate (ARA codes). Protographs representing various types of ARA codes, including AR3A, AR4A and ARJA codes, are described. High performance is obtained when compared to the performance of current repeat-accumulate (RA) or irregular-repeat-accumulate (IRA) codes.

  1. QR Codes 101

    ERIC Educational Resources Information Center

    Crompton, Helen; LaFrance, Jason; van 't Hooft, Mark

    2012-01-01

    A QR (quick-response) code is a two-dimensional scannable code, similar in function to a traditional bar code that one might find on a product at the supermarket. The main difference between the two is that, while a traditional bar code can hold a maximum of only 20 digits, a QR code can hold up to 7,089 characters, so it can contain much more…

  2. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  3. Nonbinary Quantum Convolutional Codes Derived from Negacyclic Codes

    NASA Astrophysics Data System (ADS)

    Chen, Jianzhang; Li, Jianping; Yang, Fan; Huang, Yuanyuan

    2015-01-01

    In this paper, some families of nonbinary quantum convolutional codes are constructed by using negacyclic codes. These nonbinary quantum convolutional codes are different from quantum convolutional codes in the literature. Moreover, we construct a family of optimal quantum convolutional codes.

  4. A Comparison of Unsupervised Classifiers on BATSE Catalog Data

    NASA Astrophysics Data System (ADS)

    Hakkila, Jon; Roiger, Richard J.; Haglin, David J.; Giblin, Timothy W.; Paciesas, William S.

    2003-04-01

    We classify BATSE gamma-ray bursts using unsupervised clustering algorithms in order to compare classification with statistical clustering techniques. BATSE bursts detected with homogeneous trigger criteria and measured with a limited attribute set (duration, hardness, and fluence) are classified using four unsupervised algorithms (the concept hierarchy classifier ESX, the EM algorithm, the Kmeans algorithm, and a kohonen neural network). The classifiers prefer three-class solutions to two-class and four-class solutions. When forced to find two classes, the classifiers do not find the traditional long and short classes; many short soft events are placed in a class with the short hard bursts. When three classes are found, the classifiers clearly identify the short bursts, but place far more members in an intermediate duration soft class than have been found using statistical clustering techniques. It appears that the boundary between short faint and long bright bursts is more important to the classifiers than is the boundary between short hard and long soft bursts. We conclude that the boundary between short faint and long hard bursts is the result of data bias and poor attribute selection. We recommend that future gamma-ray burst classification avoid using extrinsic parameters such as fluence, and should instead concentrate on intrinsic properties such as spectral, temporal, and (when available) luminosity characteristics. Future classification should also be wary of correlated attributes (such as fluence and duration), as these bias classification results.

  5. Class-specific Error Bounds for Ensemble Classifiers

    SciTech Connect

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  6. Lectin cDNA and transgenic plants derived therefrom

    SciTech Connect

    Raikhel, Natasha V.

    2000-10-03

    Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties.

  7. Revisiting the Physico-Chemical Hypothesis of Code Origin: An Analysis Based on Code-Sequence Coevolution in a Finite Population

    NASA Astrophysics Data System (ADS)

    Bandhu, Ashutosh Vishwa; Aggarwal, Neha; Sengupta, Supratim

    2013-12-01

    The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examine the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explore two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.

  8. Automatically classifying question types for consumer health questions.

    PubMed

    Roberts, Kirk; Kilicoglu, Halil; Fiszman, Marcelo; Demner-Fushman, Dina

    2014-01-01

    We present a method for automatically classifying consumer health questions. Our thirteen question types are designed to aid in the automatic retrieval of medical answers from consumer health resources. To our knowledge, this is the first machine learning-based method specifically for classifying consumer health questions. We demonstrate how previous approaches to medical question classification are insufficient to achieve high accuracy on this task. Additionally, we describe, manually annotate, and automatically classify three important question elements that improve question classification over previous techniques. Our results and analysis illustrate the difficulty of the task and the future directions that are necessary to achieve high-performing consumer health question classification.

  9. A Spatial Classifier for Multispectral Data Using Contextual Information

    NASA Technical Reports Server (NTRS)

    Hung, Chih-Cheng; Fahsi, Ahmed; Coleman, Tommy

    1998-01-01

    Connectivity describes the spatial relationship among pixels. A spatial classifier which employs the sigma probability concept of the Gaussian distribution and a type of contextual information connectivity of the pixels, is studied in this paper. This spatial classifier attempts to replicate the kind of spatial synthesis done by the human analyst during visual interpretation or to capture the spatial relationships inherent in an aerial photograph. Several classification results of the Landsat TM data using this classifier with different window sizes for capturing the contextual information are illustrated and compared.

  10. Regional variation in medical classification agreement: benchmarking the coding gap.

    PubMed

    Lorence, Daniel

    2003-10-01

    The growing use of classification and coding of patient data in medical information systems has resulted in increased dependence on the accuracy of coding practices. Information maintained on systems must be trusted by both providers and managers in order to serve as a viable tool for the delivery of healthcare in an evidence-based environment. A national survey of health information managers was employed here to assess observed levels of coder agreement with physician code selections used in classifying patient data. Findings from this survey suggest that, on a national level, the quality of coded data may suffer as a result of disagreement or inconsistent coding within healthcare provider organizations, in an era where physicians are increasingly called upon to enter and classify patient data via computerized medical records. Nineteen percent of respondents report that coder-physician classification disagreement occurred on more than 5% of all patient encounters. In some cases disagreement occurs in 20% or more instances of code selection. This phenomenon occurred to varying degrees across regions and market areas, suggesting a confounding influence when coded data is aggregated for comparative purposes. In an evidence-based healthcare environment, coded data often serves as a representation of clinical performance. Given the increasing complexity of medical information classification systems, reliance on such data may pose a risk for both practitioners and managers without consistent agreement on coding practices and procedures. PMID:14584620

  11. Asymmetric quantum convolutional codes

    NASA Astrophysics Data System (ADS)

    La Guardia, Giuliano G.

    2016-01-01

    In this paper, we construct the first families of asymmetric quantum convolutional codes (AQCCs). These new AQCCs are constructed by means of the CSS-type construction applied to suitable families of classical convolutional codes, which are also constructed here. The new codes have non-catastrophic generator matrices, and they have great asymmetry. Since our constructions are performed algebraically, i.e. we develop general algebraic methods and properties to perform the constructions, it is possible to derive several families of such codes and not only codes with specific parameters. Additionally, several different types of such codes are obtained.

  12. Statistical approaches to account for false-positive errors in environmental DNA samples.

    PubMed

    Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid

    2016-05-01

    Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies.

  13. Chemical Shift Assignments of Mouse HOXD13 DNA Binding Domain Bound to Duplex DNA

    PubMed Central

    Turner, Matthew; Zhang, Yonghong; Carlson, Hanqian L.; Stadler, H. Scott; Ames, James B.

    2014-01-01

    The homeobox gene (Hoxd13) codes for a transcription factor protein that binds to AT-rich DNA sequences and controls expression of proteins that control embryonic morphogenesis. We report NMR chemical shift assignments of mouse Hoxd13 DNA binding domain bound to an 11-residue DNA duplex (BMRB no. 25133). PMID:25491407

  14. The effect of abnormal cell proportion on specimen classifier performance

    NASA Technical Reports Server (NTRS)

    Castleman, K. R.; White, B. S.

    1981-01-01

    An analysis is presented of the results obtained from a cell classifier which is confronted with an abnormal/normal cell ratio which is different from the ratio assumed in the calibration of the classifier. False negative and false positive error rates are determined in advance for classifier operation, along with the necessary sample size in order to validate the predicted distributions. Changes are demonstrated to happen only regarding the false negative rate, where reductions in the abnormal cell rate below the expected rates would cause totally unreliable data. Substantial overproduction of abnormal cells would be quickly noticeable, while production rates beyond, but close to, the expected rates would only require more extensive sampling. Classifier systems for 10% proportions of abnormal cells are concluded to be possible, but difficulties are present with much lower rates

  15. 46 CFR 503.59 - Safeguarding classified information.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... periodic inspections to determine if the procedural safeguards prescribed in this subpart are in effect at... access to classified information, or other sanctions in accordance with applicable law and...

  16. DETAIL VIEW OF CLASSIFIER, TAILINGS LAUNDER TROUGH, LINESHAFTS, AND CONCENTRATION ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    DETAIL VIEW OF CLASSIFIER, TAILINGS LAUNDER TROUGH, LINESHAFTS, AND CONCENTRATION TABLES WITH SIX FOOT SCALE, LOOKING SOUTHWEST. - Gold Hill Mill, Warm Spring Canyon Road, Death Valley Junction, Inyo County, CA

  17. Classified Component Disposal at the Nevada National Security Site

    SciTech Connect

    Poling, J.; Arnold, P.; Saad, M.; DiSanza, F.; Cabble, K.

    2012-11-05

    The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012.

  18. 6 CFR 7.23 - Emergency release of classified information.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... individuals who receive it to only those persons with a specific need-to-know; (3) Transmit the classified... other means deemed necessary in exigent circumstances; (4) Provide instructions about what...

  19. 28 CFR 17.41 - Access to classified information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... raised solely on the basis of the sexual orientation of the employee or mental health counseling. (d) An... sexual orientation in granting access to classified information. However, the Department may...

  20. 28 CFR 17.41 - Access to classified information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... raised solely on the basis of the sexual orientation of the employee or mental health counseling. (d) An... sexual orientation in granting access to classified information. However, the Department may...

  1. 28 CFR 17.41 - Access to classified information.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... raised solely on the basis of the sexual orientation of the employee or mental health counseling. (d) An... sexual orientation in granting access to classified information. However, the Department may...

  2. Remote Sensing Data Binary Classification Using Boosting with Simple Classifiers

    NASA Astrophysics Data System (ADS)

    Nowakowski, Artur

    2015-10-01

    Boosting is a classification method which has been proven useful in non-satellite image processing while it is still new to satellite remote sensing. It is a meta-algorithm, which builds a strong classifier from many weak ones in iterative way. We adapt the AdaBoost.M1 boosting algorithm in a new land cover classification scenario based on utilization of very simple threshold classifiers employing spectral and contextual information. Thresholds for the classifiers are automatically calculated adaptively to data statistics. The proposed method is employed for the exemplary problem of artificial area identification. Classification of IKONOS multispectral data results in short computational time and overall accuracy of 94.4% comparing to 94.0% obtained by using AdaBoost.M1 with trees and 93.8% achieved using Random Forest. The influence of a manipulation of the final threshold of the strong classifier on classification results is reported.

  3. A hybrid classifier for handwritten mathematical expression recognition

    NASA Astrophysics Data System (ADS)

    Awal, Ahmad-Montaser; Mouchère, Harold; Viard-Gaudin, Christian

    2010-01-01

    In this paper we propose a hybrid symbol classifier within a global framework for online handwritten mathematical expression recognition. The proposed architecture aims at handling mathematical expression recognition as a simultaneous optimization of symbol segmentation, symbol recognition, and 2D structure recognition under the restriction of a mathematical expression grammar. To deal with the junk problem encountered when a segmentation graph approach is used, we consider a two level classifier. A symbol classifier cooperates with a second classifier specialized to accept or reject a segmentation hypothesis. The proposed system is trained with a set of synthetic online handwritten mathematical expressions. When tested on a set of real complex expressions, the system achieves promising results at both symbol and expression interpretation levels.

  4. Hybrid Hierarchical Classifiers for Categorization of Medical Documents.

    ERIC Educational Resources Information Center

    Ruiz, Miguel E.; Stinivasan, Padmini

    2003-01-01

    Explores the use of linear models and a combination of neural networks and linear classifiers to create a hybrid hierarchical mixture of experts (HME) model. Results confirm that using the hierarchical structure of the classification vocabulary improves categorization performance. (AEF)

  5. 29. DETAIL OF CLASSIFIER, LOOKING NORTH NORTHWEST. THIS MACHINE WAS ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    29. DETAIL OF CLASSIFIER, LOOKING NORTH NORTHWEST. THIS MACHINE WAS USED TO SEPARATE SLIMES FROM SANDS TO PREPARE THE WET ORE PULP FOR CYANIDE PROCESSING. - Skidoo Mine, Park Route 38 (Skidoo Road), Death Valley Junction, Inyo County, CA

  6. 5 CFR 1312.5 - Authority to classify.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... International Affairs. (iv) Associate Director for Natural Resources, Energy and Science. (2) Secret and below... delegated to persons who only reproduce, extract, or summarize classified information, or who only...

  7. 5 CFR 1312.5 - Authority to classify.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... International Affairs. (iv) Associate Director for Natural Resources, Energy and Science. (2) Secret and below... delegated to persons who only reproduce, extract, or summarize classified information, or who only...

  8. Robust Combining of Disparate Classifiers Through Order Statistics

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  9. Towards structural classification of long non-coding RNAs.

    PubMed

    Sanbonmatsu, Karissa Y

    2016-01-01

    While long non-coding RNAs play key roles in disease and development, few structural studies have been performed to date for this emerging class of RNAs. Previous structural studies are reviewed, and a pipeline is presented to determine secondary structures of long non-coding RNAs. Similar to riboswitches, experimentally determined secondary structures of long non-coding RNAs for one species, may be used to improve sequence/structure alignments for other species. As riboswitches have been classified according to their secondary structure, a similar scheme could be used to classify long non-coding RNAs. This article is part of a Special Issue titled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa.

  10. DETAIL VIEW OF CLASSIFIER, TAILINGS LAUNDER TROUGH, LINE SHAFTS, AND ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    DETAIL VIEW OF CLASSIFIER, TAILINGS LAUNDER TROUGH, LINE SHAFTS, AND CONCENTRATION TABLES, LOOKING SOUTHWEST. SLURRY EXITING THE BALL MILL WAS COLLECTED IN AN AMALGAMATION BOX (MISSING) FROM THE END OF THE MILL, AND INTRODUCED INTO THE CLASSIFIER. THE TAILINGS LAUDER IS ON THE GROUND AT LOWER RIGHT. THE LINE SHAFTING ABOVE PROVIDED POWER TO THE CONCENTRATION TABLES BELOW AT CENTER RIGHT. - Gold Hill Mill, Warm Spring Canyon Road, Death Valley Junction, Inyo County, CA

  11. Dealing with contaminated datasets: An approach to classifier training

    NASA Astrophysics Data System (ADS)

    Homenda, Wladyslaw; Jastrzebska, Agnieszka; Rybnik, Mariusz

    2016-06-01

    The paper presents a novel approach to classification reinforced with rejection mechanism. The method is based on a two-tier set of classifiers. First layer classifies elements, second layer separates native elements from foreign ones in each distinguished class. The key novelty presented here is rejection mechanism training scheme according to the philosophy "one-against-all-other-classes". Proposed method was tested in an empirical study of handwritten digits recognition.

  12. Support vector machines classifiers of physical activities in preschoolers

    PubMed Central

    Zhao, Wei; Adolph, Anne L; Puyau, Maurice R; Vohra, Firoz A; Butte, Nancy F; Zakeri, Issa F

    2013-01-01

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3–5 years old were asked to participate in a supervised protocol of physical activities while wearing a triaxial accelerometer. Accelerometer counts, steps, and position were obtained from the device. We applied K-means clustering to determine the number of natural groupings presented by the data. We used MLR and SVM to classify the six activity types. Using direct observation as the criterion method, the 10-fold cross-validation (CV) error rate was used to compare MLR and SVM classifiers, with and without sleep. Altogether, 58 classification models based on combinations of the accelerometer output variables were developed. In general, the SVM classifiers have a smaller 10-fold CV error rate than their MLR counterparts. Including sleep, a SVM classifier provided the best performance with a 10-fold CV error rate of 24.70%. Without sleep, a SVM classifier-based triaxial accelerometer counts, vector magnitude, steps, position, and 1- and 2-min lag and lead values achieved a 10-fold CV error rate of 20.16% and an overall classification error rate of 15.56%. SVM supersedes the classical classifier MLR in categorizing physical activities in preschool-aged children. Using accelerometer data, SVM can be used to correctly classify physical activities typical of preschool-aged children with an acceptable classification error rate. PMID:24303099

  13. One pass learning for generalized classifier neural network.

    PubMed

    Ozyildirim, Buse Melis; Avci, Mutlu

    2016-01-01

    Generalized classifier neural network introduced as a kind of radial basis function neural network, uses gradient descent based optimized smoothing parameter value to provide efficient classification. However, optimization consumes quite a long time and may cause a drawback. In this work, one pass learning for generalized classifier neural network is proposed to overcome this disadvantage. Proposed method utilizes standard deviation of each class to calculate corresponding smoothing parameter. Since different datasets may have different standard deviations and data distributions, proposed method tries to handle these differences by defining two functions for smoothing parameter calculation. Thresholding is applied to determine which function will be used. One of these functions is defined for datasets having different range of values. It provides balanced smoothing parameters for these datasets through logarithmic function and changing the operation range to lower boundary. On the other hand, the other function calculates smoothing parameter value for classes having standard deviation smaller than the threshold value. Proposed method is tested on 14 datasets and performance of one pass learning generalized classifier neural network is compared with that of probabilistic neural network, radial basis function neural network, extreme learning machines, and standard and logarithmic learning generalized classifier neural network in MATLAB environment. One pass learning generalized classifier neural network provides more than a thousand times faster classification than standard and logarithmic generalized classifier neural network. Due to its classification accuracy and speed, one pass generalized classifier neural network can be considered as an efficient alternative to probabilistic neural network. Test results show that proposed method overcomes computational drawback of generalized classifier neural network and may increase the classification performance.

  14. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    SciTech Connect

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-06-15

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 {<=} r {<=} 21 (85.2%) and r {>=} 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 {<=} r {<=} 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ({approx}2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 {<=} r {<=} 21.

  15. Gene and genon concept: coding versus regulation

    PubMed Central

    2007-01-01

    We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various

  16. Genetic coding and gene expression - new Quadruplet genetic coding model

    NASA Astrophysics Data System (ADS)

    Shankar Singh, Rama

    2012-07-01

    Successful demonstration of human genome project has opened the door not only for developing personalized medicine and cure for genetic diseases, but it may also answer the complex and difficult question of the origin of life. It may lead to making 21st century, a century of Biological Sciences as well. Based on the central dogma of Biology, genetic codons in conjunction with tRNA play a key role in translating the RNA bases forming sequence of amino acids leading to a synthesized protein. This is the most critical step in synthesizing the right protein needed for personalized medicine and curing genetic diseases. So far, only triplet codons involving three bases of RNA, transcribed from DNA bases, have been used. Since this approach has several inconsistencies and limitations, even the promise of personalized medicine has not been realized. The new Quadruplet genetic coding model proposed and developed here involves all four RNA bases which in conjunction with tRNA will synthesize the right protein. The transcription and translation process used will be the same, but the Quadruplet codons will help overcome most of the inconsistencies and limitations of the triplet codes. Details of this new Quadruplet genetic coding model and its subsequent potential applications including relevance to the origin of life will be presented.

  17. QR Code Mania!

    ERIC Educational Resources Information Center

    Shumack, Kellie A.; Reilly, Erin; Chamberlain, Nik

    2013-01-01

    space, has error-correction capacity, and can be read from any direction. These codes are used in manufacturing, shipping, and marketing, as well as in education. QR codes can be created to produce…

  18. Steganalysis in high dimensions: fusing classifiers built on random subspaces

    NASA Astrophysics Data System (ADS)

    Kodovský, Jan; Fridrich, Jessica

    2011-02-01

    By working with high-dimensional representations of covers, modern steganographic methods are capable of preserving a large number of complex dependencies among individual cover elements and thus avoid detection using current best steganalyzers. Inevitably, steganalysis needs to start using high-dimensional feature sets as well. This brings two key problems - construction of good high-dimensional features and machine learning that scales well with respect to dimensionality. Depending on the classifier, high dimensionality may lead to problems with the lack of training data, infeasibly high complexity of training, degradation of generalization abilities, lack of robustness to cover source, and saturation of performance below its potential. To address these problems collectively known as the curse of dimensionality, we propose ensemble classifiers as an alternative to the much more complex support vector machines. Based on the character of the media being analyzed, the steganalyst first puts together a high-dimensional set of diverse "prefeatures" selected to capture dependencies among individual cover elements. Then, a family of weak classifiers is built on random subspaces of the prefeature space. The final classifier is constructed by fusing the decisions of individual classifiers. The advantage of this approach is its universality, low complexity, simplicity, and improved performance when compared to classifiers trained on the entire prefeature set. Experiments with the steganographic algorithms nsF5 and HUGO demonstrate the usefulness of this approach over current state of the art.

  19. Verification of classified fissile material using unclassified attributes

    SciTech Connect

    Nicholas, N.J.; Fearey, B.L.; Puckett, J.M.; Tape, J.W.

    1998-12-31

    This paper reports on the most recent efforts of US technical experts to explore verification by IAEA of unclassified attributes of classified excess fissile material. Two propositions are discussed: (1) that multiple unclassified attributes could be declared by the host nation and then verified (and reverified) by the IAEA in order to provide confidence in that declaration of a classified (or unclassified) inventory while protecting classified or sensitive information; and (2) that attributes could be measured, remeasured, or monitored to provide continuity of knowledge in a nonintrusive and unclassified manner. They believe attributes should relate to characteristics of excess weapons materials and should be verifiable and authenticatable with methods usable by IAEA inspectors. Further, attributes (along with the methods to measure them) must not reveal any classified information. The approach that the authors have taken is as follows: (1) assume certain attributes of classified excess material, (2) identify passive signatures, (3) determine range of applicable measurement physics, (4) develop a set of criteria to assess and select measurement technologies, (5) select existing instrumentation for proof-of-principle measurements and demonstration, and (6) develop and design information barriers to protect classified information. While the attribute verification concepts and measurements discussed in this paper appear promising, neither the attribute verification approach nor the measurement technologies have been fully developed, tested, and evaluated.

  20. LESS: a model-based classifier for sparse subspaces.

    PubMed

    Veenman, Cor J; Tax, David M J

    2005-09-01

    In this paper, we specifically focus on high-dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to find, from all hyperplanes that separate the classes, a separating hyperplane which generalizes well for future data. A second important task is to determine which features are required to distinguish the classes. To attack these problems, we propose the LESS (Lowest Error in a Sparse Subspace) classifier that efficiently finds linear discriminants in a sparse subspace. In contrast with most classifiers for high-dimensional data sets, the LESS classifier incorporates a (simple) data model. Further, by means of a regularization parameter, the classifier establishes a suitable trade-off between subspace sparseness and classification accuracy. In the experiments, we show how LESS performs on several high-dimensional data sets and compare its performance to related state-of-the-art classifiers like, among others, linear ridge regression with the LASSO and the Support Vector Machine. It turns out that LESS performs competitively while using fewer dimensions.

  1. A database coding system for vascular procedures.

    PubMed

    Harris, K A; DeRose, G; Jamieson, W

    1991-01-01

    A coding system was developed to overcome the difficulties encountered in data registry and retrieval from a national audit. In vascular surgery operations are frequently combined, and neither the OHIP fee schedule of codes (Ontario, Canada) nor the ICD-9 system provides sufficient detail for most vascular surgeons to retrieve information for long-term follow-up. However, some wish to record minimal data on their operative procedures. A numeric classification system was developed. A five-digit number is used, the first two digits classifying the operative procedure and anatomic details. Two decimal digits code the classification of operation (e.g., aortic aneurysm, tube graft, aortoiliac, or aortobifemoral) and the final digit may be used as a modifier. "Holes" in the numeric system allow for new operations to be added as they develop. Codes are stored in a database with the following fields: 1) codes; 2) description of operation; 3) translation. The translation field may be modified to permit translation of any existing databases into the system. This database has been distributed with a data registry program free of charge to vascular surgeons in Canada to allow nationwide registry of vascular surgery patients. A numeric code eliminates spelling and abbreviation errors, and can be sufficiently broad-based to allow all surgeons to participate in a nationwide audit.

  2. Discriminatory power of three DNA-based typing techniques for Pseudomonas aeruginosa.

    PubMed Central

    Grundmann, H; Schneider, C; Hartung, D; Daschner, F D; Pitt, T L

    1995-01-01

    We assessed the capacity of three DNA typing techniques to discriminate between 81 geographically, temporally, and epidemiologically unrelated strains of Pseudomonas aeruginosa. The methods, representing powerful tools for hospital molecular epidemiology, included hybridization of restricted chromosomal DNA with toxA and genes coding for rRNA (rDNA) used as probes and macrorestriction analysis of SpeI-digested DNA by pulsed-field gel electrophoresis. The probe typing techniques were able to classify all strains into a limited number of types, and the discriminatory powers were 97.7 and 95.6% for toxA and rDNA typing, respectively. Strains that were indistinguishable on the basis of both toxA and rDNA types defined 12 probe type homology groups. Of these, one contained five strains, three contained three strains each, and eight groups were represented by two strains each. Strains in 10 of the homology groups had the same O serotype. SpeI macrorestriction patterns discriminated between all strains with at least four band differences, which corresponded to a similarity level of 85%. Fifteen pairs of strains were similar at a level of > 75% and differed by only four to seven bands. Of these pairs, 11 belonged to the same probe type homology group, indicating their clonal relatedness. We conclude that macrorestriction analysis of P. aeruginosa with SpeI provides the best means of discrimination between epidemiologically unrelated strains. However, DNA probe typing with either toxA or rDNA reveals information on the strain population structure and evolutionary relationships. PMID:7751352

  3. EMF wire code research

    SciTech Connect

    Jones, T.

    1993-11-01

    This paper examines the results of previous wire code research to determines the relationship with childhood cancer, wire codes and electromagnetic fields. The paper suggests that, in the original Savitz study, biases toward producing a false positive association between high wire codes and childhood cancer were created by the selection procedure.

  4. Analysis of the Hox epigenetic code.

    PubMed

    Ezziane, Zoheir

    2012-04-10

    Archetypes of histone modifications associated with diverse chromosomal states that regulate access to DNA are leading the hypothesis of the histone code (or epigenetic code). However, it is still not evident how these post-translational modifications of histone tails lead to changes in chromatin structure. Histone modifications are able to activate and/or inactivate several genes and can be transmitted to next generation cells due to an epigenetic memory. The challenging issue is to identify or "decrypt" the code used to transmit these modifications to descent cells. Here, an attempt is made to describe how histone modifications operate as part of histone code that stipulates patterns of gene expression. This papers emphasizes particularly on the correlation between histone modifications and patterns of Hox gene expression in Caenorhabditis elegans. This work serves as an example to illustrate the power of the epigenetic machinery and its use in drug design and discovery. PMID:22553504

  5. Geant4-DNA simulations using complex DNA geometries generated by the DnaFabric tool

    NASA Astrophysics Data System (ADS)

    Meylan, S.; Vimont, U.; Incerti, S.; Clairand, I.; Villagrasa, C.

    2016-07-01

    Several DNA representations are used to study radio-induced complex DNA damages depending on the approach and the required level of granularity. Among all approaches, the mechanistic one requires the most resolved DNA models that can go down to atomistic DNA descriptions. The complexity of such DNA models make them hard to modify and adapt in order to take into account different biological conditions. The DnaFabric project was started to provide a tool to generate, visualise and modify such complex DNA models. In the current version of DnaFabric, the models can be exported to the Geant4 code to be used as targets in the Monte Carlo simulation. In this work, the project was used to generate two DNA fibre models corresponding to two DNA compaction levels representing the hetero and the euchromatin. The fibres were imported in a Geant4 application where computations were performed to estimate the influence of the DNA compaction on the amount of calculated DNA damage. The relative difference of the DNA damage computed in the two fibres for the same number of projectiles was found to be constant and equal to 1.3 for the considered primary particles (protons from 300 keV to 50 MeV). However, if only the tracks hitting the DNA target are taken into account, then the relative difference is more important for low energies and decreases to reach zero around 10 MeV. The computations were performed with models that contain up to 18,000 DNA nucleotide pairs. Nevertheless, DnaFabric will be extended to manipulate multi-scale models that go from the molecular to the cellular levels.

  6. Population coding of affect across stimuli, modalities and individuals

    PubMed Central

    Chikazoe, Junichi; Lee, Daniel H.; Kriegeskorte, Nikolaus; Anderson, Adam K.

    2014-01-01

    It remains unclear how the brain represents external objective sensory events alongside our internal subjective impressions of them—affect. Representational mapping of population level activity evoked by complex scenes and basic tastes uncovered a neural code supporting a continuous axis of pleasant-to-unpleasant valence. This valence code was distinct from low-level physical and high-level object properties. While ventral temporal and anterior insular cortices supported valence codes specific to vision and taste, both the medial and lateral orbitofrontal cortices (OFC), maintained a valence code independent of sensory origin. Further only the OFC code could classify experienced affect across participants. The entire valence spectrum is represented as a collective pattern in regional neural activity as sensory-specific and abstract codes, whereby the subjective quality of affect can be objectively quantified across stimuli, modalities, and people. PMID:24952643

  7. A Decision Tree Based Classifier to Analyze Human Ovarian Cancer cDNA Microarray Datasets.

    PubMed

    Tsai, Meng-Hsiun; Wang, Hsin-Chieh; Lee, Guan-Wei; Lin, Yi-Chen; Chiu, Sheng-Hsiung

    2016-01-01

    Ovarian cancer is the deadliest gynaecological disease because of the high mortality rate and there is no any symptom in cancer early stage. It was often the terminal cancer period when patients were diagnosed with ovarian cancer and thus delays a good opportunity of treatment. The current common method for detecting ovarian cancer is blood testing for analyzing the tumor marker CA-125 of serum. However, specificity and sensitivity of CA-125 are insufficient for early detection. Therefore, it has become an urgent issue to look for an efficient method which precisely detects the tumor markers for ovarian cancer. This study aims to find the target genes of ovarian cancer by different algorithms of information science. Feature selection and decision tree were applied to analyze 9600 ovarian cancer-related genes. After screening the target genes, candidate genes will be analyzed by Ingenuity Pathway Analysis (IPA) software to create a genetic pathway model and to understand the interactive relationship in the different pathological stages of ovarian cancer. Finally, this research found 9 oncogenes associated with ovarian cancer and some genes had not been discovered in previous studies. This system will assist medical staffs in diagnosis and treatment at cancer early stage and improve the patient's survival. PMID:26531754

  8. Software Certification - Coding, Code, and Coders

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Holzmann, Gerard J.

    2011-01-01

    We describe a certification approach for software development that has been adopted at our organization. JPL develops robotic spacecraft for the exploration of the solar system. The flight software that controls these spacecraft is considered to be mission critical. We argue that the goal of a software certification process cannot be the development of "perfect" software, i.e., software that can be formally proven to be correct under all imaginable and unimaginable circumstances. More realistically, the goal is to guarantee a software development process that is conducted by knowledgeable engineers, who follow generally accepted procedures to control known risks, while meeting agreed upon standards of workmanship. We target three specific issues that must be addressed in such a certification procedure: the coding process, the code that is developed, and the skills of the coders. The coding process is driven by standards (e.g., a coding standard) and tools. The code is mechanically checked against the standard with the help of state-of-the-art static source code analyzers. The coders, finally, are certified in on-site training courses that include formal exams.

  9. Disparity in coding concordance: do physicians and coders agree?

    PubMed

    Lorence, Daniel P; Ibrahim, Ibrahim Awad

    2003-01-01

    Increasing demands for large-scale comparative analysis of health care costs has led to a similar demand for consistently classified data. Evidence-based medicine demands evidence that can be trusted. This study sought to assess managers' observed levels of agreement with physician code selections when classifying patient data. Using a non-sampled research design of both mailed and telephone surveys, we employ a nationwide cross-section of over 16,000 accredited US medical record managers. As a main outcome measure, we evaluate reported levels of agreement between physician and information manager code selections made when classifying patient data. Results indicate about 19 percent of respondents report that coder-physician classification disagreement occurred on more than 5 percent of all patient encounters. In some cases, disagreement occurred in 20 percent or more instances of code selection. This phenomenon shows significant variation across key demographic and market indicators. With the growing practice of measuring coded data quality as an outcome of health care financial performance, along with adoption of electronic classification and patient record systems, the accuracy of coded data is likely to remain uncertain in the absence of more consistent classification and coding practices. PMID:12908653

  10. Remote-Handled Transuranic Content Codes

    SciTech Connect

    Washington TRU Solutions

    2001-08-01

    The Remote-Handled Transuranic (RH-TRU) Content Codes (RH-TRUCON) document representsthe development of a uniform content code system for RH-TRU waste to be transported in the 72-Bcask. It will be used to convert existing waste form numbers, content codes, and site-specificidentification codes into a system that is uniform across the U.S. Department of Energy (DOE) sites.The existing waste codes at the sites can be grouped under uniform content codes without any lossof waste characterization information. The RH-TRUCON document provides an all-encompassing|description for each content code and compiles this information for all DOE sites. Compliance withwaste generation, processing, and certification procedures at the sites (outlined in this document foreach content code) ensures that prohibited waste forms are not present in the waste. The contentcode gives an overall description of the RH-TRU waste material in terms of processes and|packaging, as well as the generation location. This helps to provide cradle-to-grave traceability ofthe waste material so that the various actions required to assess its qualification as payload for the72-B cask can be performed. The content codes also impose restrictions and requirements on themanner in which a payload can be assembled.The RH-TRU Waste Authorized Methods for Payload Control (RH-TRAMPAC), Appendix 1.3.7of the 72-B Cask Safety Analysis Report (SAR), describes the current governing procedures|applicable for the qualification of waste as payload for the 72-B cask. The logic for this|classification is presented in the 72-B Cask SAR. Together, these documents (RH-TRUCON,|RH-TRAMPAC, and relevant sections of the 72-B Cask SAR) present the foundation and|justification for classifying RH-TRU waste into content codes. Only content codes described in thisdocument can be considered for transport in the 72-B cask. Revisions to this document will be madeas additional waste qualifies for transport. |Each content code uniquely

  11. SAR terrain classifier and mapper of biophysical attributes

    NASA Technical Reports Server (NTRS)

    Ulaby, Fawwaz T.; Dobson, M. Craig; Pierce, Leland; Sarabandi, Kamal

    1993-01-01

    In preparation for the launch of SIR-C/X-SAR and design studies for future orbital SAR, a program has made considerable progress in the development of an SAR terrain classifier and algorithms for quantification of biophysical attributes. The goal of this program is to produce a generalized software package for terrain classification and estimation of biophysical attributes and to make this package available to the larger scientific community. The basic elements of the SAR (Synthetic Aperture Radar) terrain classifier are outlined. An SAR image is calibrated with respect to known system and processor gains and external targets (if available). A Level 1 classifier operates on the data to differentiate: urban features, surfaces and tall and short vegetation. Level 2 classifiers further subdivide these classes on the basis of structure. Finally, biophysical and geophysical inversions are applied to each class to estimate attributes of interest. The process used to develop the classifiers and inversions is shown. Radar scattering models developed from theory and from empirical data obtained by truck-mounted polarimeters and the JPL AirSAR are validated. The validated models are used in sensitivity studies to understand the roles of various scattering sources (i.e., surface trunk, branches, etc.) in determining net backscatter. Model simulations of sigma (sup o) as functions of the wave parameters (lambda, polarization and angle of incidence) and the geophysical and biophysical attributes are used to develop robust classifiers. The classifiers are validated using available AirSAR data sets. Specific estimators are developed for each class on the basis of the scattering models and empirical data sets. The candidate algorithms are tested with the AirSAR data sets. The attributes of interest include: total above ground biomass, woody biomass, soil moisture and soil roughness.

  12. Representative Vector Machines: A Unified Framework for Classical Classifiers.

    PubMed

    Gui, Jie; Liu, Tongliang; Tao, Dacheng; Sun, Zhenan; Tan, Tieniu

    2016-08-01

    Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k -NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.

  13. Coding for Electronic Mail

    NASA Technical Reports Server (NTRS)

    Rice, R. F.; Lee, J. J.

    1986-01-01

    Scheme for coding facsimile messages promises to reduce data transmission requirements to one-tenth current level. Coding scheme paves way for true electronic mail in which handwritten, typed, or printed messages or diagrams sent virtually instantaneously - between buildings or between continents. Scheme, called Universal System for Efficient Electronic Mail (USEEM), uses unsupervised character recognition and adaptive noiseless coding of text. Image quality of resulting delivered messages improved over messages transmitted by conventional coding. Coding scheme compatible with direct-entry electronic mail as well as facsimile reproduction. Text transmitted in this scheme automatically translated to word-processor form.

  14. Francis Crick, DNA, and the Central Dogma

    ERIC Educational Resources Information Center

    Olby, Robert

    1970-01-01

    This essay describes how Francis Crick, ex-physicist, entered the field of biology and discovered the structure of DNA. Emphasis is upon the double helix, the sequence hypothesis, the central dogma, and the genetic code. (VW)

  15. XSOR codes users manual

    SciTech Connect

    Jow, Hong-Nian; Murfin, W.B.; Johnson, J.D.

    1993-11-01

    This report describes the source term estimation codes, XSORs. The codes are written for three pressurized water reactors (Surry, Sequoyah, and Zion) and two boiling water reactors (Peach Bottom and Grand Gulf). The ensemble of codes has been named ``XSOR``. The purpose of XSOR codes is to estimate the source terms which would be released to the atmosphere in severe accidents. A source term includes the release fractions of several radionuclide groups, the timing and duration of releases, the rates of energy release, and the elevation of releases. The codes have been developed by Sandia National Laboratories for the US Nuclear Regulatory Commission (NRC) in support of the NUREG-1150 program. The XSOR codes are fast running parametric codes and are used as surrogates for detailed mechanistic codes. The XSOR codes also provide the capability to explore the phenomena and their uncertainty which are not currently modeled by the mechanistic codes. The uncertainty distributions of input parameters may be used by an. XSOR code to estimate the uncertainty of source terms.

  16. DLLExternalCode

    SciTech Connect

    Greg Flach, Frank Smith

    2014-05-14

    DLLExternalCode is the a general dynamic-link library (DLL) interface for linking GoldSim (www.goldsim.com) with external codes. The overall concept is to use GoldSim as top level modeling software with interfaces to external codes for specific calculations. The DLLExternalCode DLL that performs the linking function is designed to take a list of code inputs from GoldSim, create an input file for the external application, run the external code, and return a list of outputs, read from files created by the external application, back to GoldSim. Instructions for creating the input file, running the external code, and reading the output are contained in an instructions file that is read and interpreted by the DLL.

  17. DLLExternalCode

    2014-05-14

    DLLExternalCode is the a general dynamic-link library (DLL) interface for linking GoldSim (www.goldsim.com) with external codes. The overall concept is to use GoldSim as top level modeling software with interfaces to external codes for specific calculations. The DLLExternalCode DLL that performs the linking function is designed to take a list of code inputs from GoldSim, create an input file for the external application, run the external code, and return a list of outputs, read frommore » files created by the external application, back to GoldSim. Instructions for creating the input file, running the external code, and reading the output are contained in an instructions file that is read and interpreted by the DLL.« less

  18. Parafermion stabilizer codes

    NASA Astrophysics Data System (ADS)

    Gungordu, Utkan; Nepal, Rabindra; Kovalev, Alexey

    2015-03-01

    We define and study parafermion stabilizer codes [Phys. Rev. A 90, 042326 (2014)] which can be viewed as generalizations of Kitaev's one dimensional model of unpaired Majorana fermions. Parafermion stabilizer codes can protect against low-weight errors acting on a small subset of parafermion modes in analogy to qudit stabilizer codes. Examples of several smallest parafermion stabilizer codes are given. Our results show that parafermions can achieve a better encoding rate than Majorana fermions. A locality preserving embedding of qudit operators into parafermion operators is established which allows one to map known qudit stabilizer codes to parafermion codes. We also present a local 2D parafermion construction that combines topological protection of Kitaev's toric code with additional protection relying on parity conservation. This work was supported in part by the NSF under Grants No. Phy-1415600 and No. NSF-EPSCoR 1004094.

  19. Do plant cell walls have a code?

    PubMed

    Tavares, Eveline Q P; Buckeridge, Marcos S

    2015-12-01

    A code is a set of rules that establish correspondence between two worlds, signs (consisting of encrypted information) and meaning (of the decrypted message). A third element, the adaptor, connects both worlds, assigning meaning to a code. We propose that a Glycomic Code exists in plant cell walls where signs are represented by monosaccharides and phenylpropanoids and meaning is cell wall architecture with its highly complex association of polymers. Cell wall biosynthetic mechanisms, structure, architecture and properties are addressed according to Code Biology perspective, focusing on how they oppose to cell wall deconstruction. Cell wall hydrolysis is mainly focused as a mechanism of decryption of the Glycomic Code. Evidence for encoded information in cell wall polymers fine structure is highlighted and the implications of the existence of the Glycomic Code are discussed. Aspects related to fine structure are responsible for polysaccharide packing and polymer-polymer interactions, affecting the final cell wall architecture. The question whether polymers assembly within a wall display similar properties as other biological macromolecules (i.e. proteins, DNA, histones) is addressed, i.e. do they display a code?

  20. Time and space optimization of document content classifiers

    NASA Astrophysics Data System (ADS)

    Yin, Dawei; Baird, Henry S.; An, Chang

    2010-01-01

    Scaling up document-image classifiers to handle an unlimited variety of document and image types poses serious challenges to conventional trainable classifier technologies. Highly versatile classifiers demand representative training sets which can be dauntingly large: in investigating document content extraction systems, we have demonstrated the advantages of employing as many as a billion training samples in approximate k-nearest neighbor (kNN) classifiers sped up using hashed K-d trees. We report here on an algorithm, which we call online bin-decimation, for coping with training sets that are too big to fit in main memory, and we show empirically that it is superior to offline pre-decimation, which simply discards a large fraction of the training samples at random before constructing the classifier. The key idea of bin-decimation is to enforce an upper bound approximately on the number of training samples stored in each K-d hash bin; an adaptive statistical technique allows this to be accomplished online and in linear time, while reading the training data exactly once. An experiment on 86.7M training samples reveals a 23-times speedup with less than 0.1% loss of accuracy (compared to pre-decimation); or, for another value of the upper bound, a 60-times speedup with less than 5% loss of accuracy. We also compare it to four other related algorithms.