dna coding region: Topics by Science.gov

Sample records for dna coding region

Statistical properties of DNA sequences

NASA Technical Reports Server (NTRS)

Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

1995-01-01

We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

PubMed Central

Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

2015-01-01

Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190
Correlation approach to identify coding regions in DNA sequences

NASA Technical Reports Server (NTRS)

Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1994-01-01

Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Phylogenetic Network for European mtDNA

PubMed Central

Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari

2001-01-01

The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229
Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

PubMed Central

2012-01-01

Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Scaling features of noncoding DNA

NASA Technical Reports Server (NTRS)

Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

1999-01-01

We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Novel variants of the 5S rRNA genes in Eruca sativa.

PubMed

Singh, K; Bhatia, S; Lakshmikumaran, M

1994-02-01

The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Intricate and Cell Type-Specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens.

PubMed

Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z

2017-10-05

Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.
On fuzzy semantic similarity measure for DNA coding.

PubMed

Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin

2016-02-01

A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

NASA Technical Reports Server (NTRS)

Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
A biological inspired fuzzy adaptive window median filter (FAWMF) for enhancing DNA signal processing.

PubMed

Ahmad, Muneer; Jung, Low Tan; Bhuiyan, Al-Amin

2017-10-01

Digital signal processing techniques commonly employ fixed length window filters to process the signal contents. DNA signals differ in characteristics from common digital signals since they carry nucleotides as contents. The nucleotides own genetic code context and fuzzy behaviors due to their special structure and order in DNA strand. Employing conventional fixed length window filters for DNA signal processing produce spectral leakage and hence results in signal noise. A biological context aware adaptive window filter is required to process the DNA signals. This paper introduces a biological inspired fuzzy adaptive window median filter (FAWMF) which computes the fuzzy membership strength of nucleotides in each slide of window and filters nucleotides based on median filtering with a combination of s-shaped and z-shaped filters. Since coding regions cause 3-base periodicity by an unbalanced nucleotides' distribution producing a relatively high bias for nucleotides' usage, such fundamental characteristic of nucleotides has been exploited in FAWMF to suppress the signal noise. Along with adaptive response of FAWMF, a strong correlation between median nucleotides and the Π shaped filter was observed which produced enhanced discrimination between coding and non-coding regions contrary to fixed length conventional window filters. The proposed FAWMF attains a significant enhancement in coding regions identification i.e. 40% to 125% as compared to other conventional window filters tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. This study proves that conventional fixed length window filters applied to DNA signals do not achieve significant results since the nucleotides carry genetic code context. The proposed FAWMF algorithm is adaptive and outperforms significantly to process DNA signal contents. The algorithm applied to variety of DNA datasets produced noteworthy discrimination between coding and non-coding regions contrary to fixed window length conventional filters. Copyright © 2017 Elsevier B.V. All rights reserved.
Functional interrogation of non-coding DNA through CRISPR genome editing

PubMed Central

Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

2017-01-01

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
CRITICA: coding region identification tool invoking comparative analysis

NASA Technical Reports Server (NTRS)

Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1999-01-01

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis.

PubMed

Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin

2006-11-01

The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.
Statistical and linguistic features of DNA sequences

NASA Technical Reports Server (NTRS)

Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Functional interrogation of non-coding DNA through CRISPR genome editing.

PubMed

Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

2017-05-15

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

PubMed

Kawano, Tomonori

2013-03-01

There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.
A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

PubMed

Kress, W John; Erickson, David L

2007-06-06

A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
DNA barcode goes two-dimensions: DNA QR code web server.

PubMed

Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

2012-01-01

The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
DNA methylation of miRNA coding sequences putatively associated with childhood obesity.

PubMed

Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A

2017-02-01

Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.

The distribution of DNA damage is defined by region-specific susceptibility to DNA damage formation rather than repair differences.

PubMed

Strand, Janne M; Scheffler, Katja; Bjørås, Magnar; Eide, Lars

2014-06-01

The cellular genomes are continuously damaged by reactive oxygen species (ROS) from aerobic processes. The impact of DNA damage depends on the specific site as well as the cellular state. The steady-state level of DNA damage is the net result of continuous formation and subsequent repair, but it is unknown to what extent heterogeneous damage distribution is caused by variations in formation or repair of DNA damage. Here, we used a restriction enzyme/qPCR based method to analyze DNA damage in promoter and coding regions of four nuclear genes: the two house-keeping genes Gadph and Tbp, and the Ndufa9 and Ndufs2 genes encoding mitochondrial complex I subunits, as well as mt-Rnr1 encoded by mitochondrial DNA (mtDNA). The distribution of steady-state levels of damage varied in a site-specific manner. Oxidative stress induced damage in nDNA to a similar extent in promoter and coding regions, and more so in mtDNA. The subsequent removal of damage from nDNA was efficient and comparable with recovery times depending on the initial damage load, while repair of mtDNA was delayed with subsequently slower repair rate. The repair was furthermore found to be independent of transcription or the transcription-coupled repair factor CSB, but dependent on cellular ATP. Our results demonstrate that the capacity to repair DNA is sufficient to remove exogenously induced damage. Thus, we conclude that the heterogeneous steady-state level of DNA damage in promoters and coding regions is caused by site-specific DNA damage/modifications that take place under normal metabolism. Copyright © 2014 Elsevier B.V. All rights reserved.
Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system

PubMed Central

Kawano, Tomonori

2013-01-01

There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

PubMed Central

Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

2012-01-01

The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

PubMed Central

Kress, W. John; Erickson, David L.

2007-01-01

Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Phylogeographic Differentiation of Mitochondrial DNA in Han Chinese

PubMed Central

Yao, Yong-Gang; Kong, Qing-Peng; Bandelt, Hans-Jürgen; Kivisild, Toomas; Zhang, Ya-Ping

2002-01-01

To characterize the mitochondrial DNA (mtDNA) variation in Han Chinese from several provinces of China, we have sequenced the two hypervariable segments of the control region and the segment spanning nucleotide positions 10171–10659 of the coding region, and we have identified a number of specific coding-region mutations by direct sequencing or restriction-fragment–length–polymorphism tests. This allows us to define new haplogroups (clades of the mtDNA phylogeny) and to dissect the Han mtDNA pool on a phylogenetic basis, which is a prerequisite for any fine-grained phylogeographic analysis, the interpretation of ancient mtDNA, or future complete mtDNA sequencing efforts. Some of the haplogroups under study differ considerably in frequencies across different provinces. The southernmost provinces show more pronounced contrasts in their regional Han mtDNA pools than the central and northern provinces. These and other features of the geographical distribution of the mtDNA haplogroups observed in the Han Chinese make an initial Paleolithic colonization from south to north plausible but would suggest subsequent migration events in China that mainly proceeded from north to south and east to west. Lumping together all regional Han mtDNA pools into one fictive general mtDNA pool or choosing one or two regional Han populations to represent all Han Chinese is inappropriate for prehistoric considerations as well as for forensic purposes or medical disease studies. PMID:11836649
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China

PubMed Central

Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang

2013-01-01

Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

PubMed

Hua, Wei; Wang, Jiasong; Zhao, Jian

2014-01-01

Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.

PubMed

Herrnstadt, Corinna; Elson, Joanna L; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M; Anderson, Christen; Ghosh, Soumitra S; Olefsky, Jerrold M; Beal, M Flint; Davis, Robert E; Howell, Neil

2002-05-01

The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here.
Delimitation of essential genes of cassava latent virus DNA 2.

PubMed Central

Etessami, P; Callis, R; Ellwood, S; Stanley, J

1988-01-01

Insertion and deletion mutagenesis of both extended open reading frames (ORFs) of cassava latent virus DNA 2 destroys infectivity. Infectivity is restored by coinoculating constructs that contain single mutations within different ORFs. Although frequent intermolecular recombination produces dominant parental-type virus, mutants can be retained within the virus population indicating that they are competent for replication and suggesting that rescue can occur by complementation of trans acting gene products. By cloning specific fragments into DNA 1 coat protein deletion vectors we have delimited the DNA 2 coding regions and provide substantive evidence that both are essential for virus infection. Although a DNA 2 component is unique to whitefly-transmitted geminiviruses, the results demonstrate that neither coding region is involved solely in insect transmission. The requirement for a bipartite genome for whitefly-transmitted geminiviruses is discussed. Images PMID:3387209
Transcriptional mapping of the ribosomal RNA region of mouse L-cell mitochondrial DNA.

PubMed Central

Nagley, P; Clayton, D A

1980-01-01

The map positions in mouse mitochondrial DNA of the two ribosomal RNA genes and adjacent genes coding several small transcripts have been determined precisely by application of a procedure in which DNA-RNA hybrids have been subjected to digestion by S1 nuclease under conditions of varying severity. Digestion of the DNA-RNA hybrids with S1 nuclease yielded a series of species which were shown to contain ribosomal RNA molecules together with adjacent transcripts hybridized conjointly to a continuous segment of mitochondrial DNA. There is one small transcript about 60 bases long whose gene adjoins the sequences coding the 5'-end of the small ribosomal RNA (950 bases) and which lies approximately 200 nucleotides from the D-loop origin of heavy strand mitochondrial DNA synthesis. An 80-base transcript lies between the small and large ribosomal RNA genes, and genes for two further short transcript (each about 80 bases in length) abut the sequences coding the 3'-end of the large ribosomal RNA (approximately 1500 bases). The ability to isolate a discrete DNA-RNA hybrid species approximately 2700 base pairs in length containing all these transcripts suggests that there can be few nucleotides in this region of mouse mitochondrial DNA which are not represented as stable RNA species. Images PMID:6253898
Forensic strategy to ensure the quality of sequencing data of mitochondrial DNA in highly degraded samples.

PubMed

Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki

2014-01-01

Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

ERIC Educational Resources Information Center

Offner, Susan

2010-01-01

The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system

USDA-ARS?s Scientific Manuscript database

Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...
Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

PubMed Central

LaPolla, R J; Mayne, K M; Davidson, N

1984-01-01

A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870
The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

PubMed

Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

2018-05-17

Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
A Tandemly Arranged Pattern of Two 5S rDNA Arrays in Amolops mantzorum (Anura, Ranidae).

PubMed

Liu, Ting; Song, Menghuan; Xia, Yun; Zeng, Xiaomao

2017-01-01

In an attempt to extend the knowledge of the 5S rDNA organization in anurans, the 5S rDNA sequences of Amolops mantzorum were isolated, characterized, and mapped by FISH. Two forms of 5S rDNA, type I (209 bp) and type II (about 870 bp), were found in specimens investigated from various populations. Both of them contained a 118-bp coding sequence, readily differentiated by their non-transcribed spacer (NTS) sizes and compositions. Four probes (the 5S rDNA coding sequences, the type I NTS, the type II NTS, and the entire type II 5S rDNA sequences) were respectively labeled with TAMRA or digoxigenin to hybridize with mitotic chromosomes for samples of all localities. It turned out that all probes showed the same signals that appeared in every centromeric region and in the telomeric regions of chromosome 5, without differences within or between populations. Obviously, both type I and type II of the 5S rDNA arrays arranged in tandem, which was contrasting with other frogs or fishes recorded to date. More interestingly, all the probes detected centromeric regions in all karyotypes, suggesting the presence of a satellite DNA family derived from 5S rDNA. © 2017 S. Karger AG, Basel.
Junk DNA and the long non-coding RNA twist in cancer genetics

PubMed Central

Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

2015-01-01

The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

PubMed

Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

1988-02-01

Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.
Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

PubMed Central

Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

1988-01-01

Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578
Synonymous deoptimization of the foot-and-mouth disease virus P1 coding region causes attenuation in vivo while inducing a strong neutralizing antibody response

USDA-ARS?s Scientific Manuscript database

Codon bias deoptimization has been previously used to successfully attenuate human pathogens including polio, respiratory syncytial and influenza viruses. We have applied a similar technology to deoptimize the capsid coding region (P1 region) of the cDNA infectious clone of foot-and-mouth disease vi...

Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

PubMed

Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

2015-05-01

Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

PubMed

Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

2016-03-01

Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Mechanisms of radiation-induced gene responses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woloschak, G.E.; Paunesku, T.

1996-10-01

In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Effective gene prediction by high resolution frequency estimator based on least-norm solution technique

PubMed Central

2014-01-01

Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
DOE Office of Scientific and Technical Information (OSTI.GOV)

Leong, JoAnn Ching

The nucleotide sequence of the IHNV glycoprotein gene has been determined from a cDNA clone containing the entire coding region. The glycoprotein cDNA clone contained a leader sequence of 48 bases, a coding region of 1524 nucleotides, and 39 bases at the 3 foot end. The entire cDNA clone contains 1609 nucleodites and encodes a protein of 508 amino acids. The deduced amino acid sequence gave a translated molecular weight of 56,795 daltons. A hydropathicity profile of the deduced amino acid sequence indicated that there were two major hydrophobic domains: one,at the N-terminus,delineating a signal peptide of 18 amino acidsmore » and the other, at the C-terminus,delineating the region of the transmembrane. Five possible sites of N-linked glyscoylation were identified. Although no nucleic acid homology existed between the IHNV glycoprotein gene and the glycoprotein genes of rabies and VSV, there was significant homology at the amino acid level between all three rhabdovirus glycoproteins.« less
The chloroplast tRNALys(UUU) gene from mustard (Sinapis alba) contains a class II intron potentially coding for a maturase-related polypeptide.

PubMed

Neuhaus, H; Link, G

1987-01-01

The trnK gene endocing the tRNALys(UUU) has been located on mustard (Sinapis alba) chloroplast DNA, 263 bp upstream of the psbA gene on the same strand. The nucleotide sequence of the trnK gene and its flanking regions as well as the putative transcription start and termination sites are shown. The 5' end of the transcript lies 121 bp upstream of the 5' tRNA coding region and is preceded by procaryotic-type "-10" and "-35" sequence elements, while the 3' end maps 2.77 kb downstream to a DNA region with possible stemloop secondary structure. The anticodon loop of the tRNALys is interrupted by a 2,574 bp intron containing a long open reading frame, which codes for 524 amino acids. Based on conserved stem and loop structures, this intron has characteristic features of a class II intron. A region near the carboxyl terminus of the derived polypeptide appears structurally related to maturases.
Cloning and sequencing of a laccase gene from the lignin-degrading basidiomycete Pleurotus ostreatus.

PubMed Central

Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G

1995-01-01

The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
The complete chloroplast genome of Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae.

PubMed

Xie, Qing; Shen, Kang-Ning; Hao, Xiuying; Nam, Phan Nhut; Ngoc Hieu, Bui Thi; Chen, Ching-Hung; Zhu, Changqing; Lin, Yen-Chang; Hsiao, Chung-Der

2017-03-01

abtract We decoded the complete chloroplast DNA (cpDNA) sequence of the Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae, by using next-generation sequencing technology. The genome consists of 152 490 bp containing a pair of inverted repeats (IRs) of 25 202 bp, which was separated by a large single-copy region and a small single-copy region of 83 446 bp and 18 639 bp, respectively. The genic regions account for 57.7% of whole cpDNA, and the GC content of the cpDNA was 37.7%. The S. involucrata cpDNA encodes 114 unigenes (82 protein-coding genes, 4 rRNA genes, and 28 tRNA genes). There are eight protein-coding genes (atpF, ndhA, ndhB, rpl2, rpoC1, rps16, clpP, and ycf3) and five tRNA genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) containing introns. A phylogenetic analysis of the 11 complete cpDNA from Asteracease showed that S. involucrata is closely related to Centaurea diffusa (Diffuse Knapweed). The complete cpDNA of S. involucrata provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Asteraceae.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Helfenbein, Kevin G.; Brown, Wesley M.; Boore, Jeffrey L.

We have sequenced the complete mitochondrial DNA (mtDNA) of the articulate brachiopod Terebratalia transversa. The circular genome is 14,291 bp in size, relatively small compared to other published metazoan mtDNAs. The 37 genes commonly found in animal mtDNA are present; the size decrease is due to the truncation of several tRNA, rRNA, and protein genes, to some nucleotide overlaps, and to a paucity of non-coding nucleotides. Although the gene arrangement differs radically from those reported for other metazoans, some gene junctions are shared with two other articulate brachiopods, Laqueus rubellus and Terebratulina retusa. All genes in the T. transversa mtDNA,more » unlike those in most metazoan mtDNAs reported, are encoded by the same strand. The A+T content (59.1 percent) is low for a metazoan mtDNA, and there is a high propensity for homopolymer runs and a strong base-compositional strand bias. The coding strand is quite G+T-rich, a skew that is shared by the confamilial (laqueid) specie s L. rubellus, but opposite to that found in T. retusa, a cancellothyridid. These compositional skews are strongly reflected in the codon usage patterns and the amino acid compositions of the mitochondrial proteins, with markedly different usage observed between T. retusa and the two laqueids. This observation, plus the similarity of the laqueid non-coding regions to the reverse complement of the non-coding region of the cancellothyridid, suggest that an inversion that resulted in a reversal in the direction of first-strand replication has occurred in one of the two lineages. In addition to the presence of one non-coding region in T. transversa that is comparable to those in the other brachiopod mtDNAs, there are two others with the potential to form secondary structures; one or both of these may be involved in the process of transcript cleavage.« less
Molecular cloning and evolutionary analysis of the calcium-modulated contractile protein, centrin, in green algae and land plants.

PubMed

Bhattacharya, D; Steinkötter, J; Melkonian, M

1993-12-01

Centrin (= caltractin) is a ubiquitous, cytoskeletal protein which is a member of the EF-hand superfamily of calcium-binding proteins. A centrin-coding cDNA was isolated and characterized from the prasinophyte green alga Scherffelia dubia. Centrin PCR amplification primers were used to isolate partial, homologous cDNA sequences from the green algae Tetraselmis striata and Spermatozopsis similis. Annealing analyses suggested that centrin is a single-copy-coding region in T. striata and S. similis and other green algae studied. Centrin-coding regions from S. dubia, S. similis and T. striata encode four colinear EF-hand domains which putatively bind calcium. Phylogenetic analyses, including homologous sequences from Chlamydomonas reinhardtii and the land plant Atriplex nummularia, demonstrate that the domains of centrins are congruent and arose from the two-fold duplication of an ancestral EF hand with Domains 1+3 and Domains 2+4 clustering. The domains of centrins are also congruent with those of calmodulins demonstrating that, like calmodulin, centrin is an ancient protein which arose within the ancestor of all eukaryotes via gene duplication. Phylogenetic relationships inferred from centrin-coding region comparisons mirror results of small subunit ribosomal RNA sequence analyses suggesting that centrin-coding regions are useful evolutionary markers within the green algae.
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research

Cancer.gov

The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Cross-species amplification of mitochondrial DNA sequence-tagged-site markers in conifers: the nature of polymorphism and variation within and among species in Picea.

PubMed

Jaramillo-Correa, J P; Bousquet, J; Beaulieu, J; Isabel, N; Perron, M; Bouillé, M

2003-05-01

Primers previously developed to amplify specific non-coding regions of the mitochondrial genome in Angiosperms, and new primers for additional non-coding mtDNA regions, were tested for their ability to direct DNA amplification in 12 conifer taxa and to detect sequence-tagged-site (STS) polymorphisms within and among eight species in Picea. Out of 12 primer pairs, nine were successful at amplifying mtDNA in most of the taxa surveyed. In conifers, indels and substitutions were observed for several loci, allowing them to distinguish between families, genera and, in some cases, between species within genera. In Picea, interspecific polymorphism was detected for four loci, while intraspecific variation was observed for three of the mtDNA regions studied. One of these (SSU rRNA V1 region) exhibited indel polymorphisms, and the two others ( nad1 intron b/c and nad5 intron1) revealed restriction differences after digestion with Sau3AI (PCR-RFLP). A fourth locus, the nad4L- orf25 intergenic region, showed a multibanding pattern for most of the spruce species, suggesting a possible gene duplication. Maternal inheritance, expected for mtDNA in conifers, was observed for all polymorphic markers except the intergenic region nad4L- orf25. Pooling of the variation observed with the remaining three markers resulted in two to six different mtDNA haplotypes within the different species of Picea. Evidence for intra-genomic recombination was observed in at least two taxa. Thus, these mitotypes are likely to be more informative than single-locus haplotypes. They should be particularly useful for the study of biogeography and the dynamics of hybrid zones.
End Joining-Mediated Gene Expression in Mammalian Cells Using PCR-Amplified DNA Constructs that Contain Terminator in Front of Promoter.

PubMed

Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji

2015-12-01

Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
Microsatellites in the Eukaryotic DNA Mismatch Repair Genes as Modulators of Evolutionary Mutation Rate

NASA Technical Reports Server (NTRS)

Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard

2003-01-01

All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.
Kangaroo – A pattern-matching program for biological sequences

PubMed Central

2002-01-01

Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

PubMed

Suyama, Mikita; Lathe, Warren C; Bork, Peer

2005-10-10

We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.
J Genes for Heavy Chain Immunoglobulins of Mouse

NASA Astrophysics Data System (ADS)

Newell, Nanette; Richards, Julia E.; Tucker, Philip W.; Blattner, Frederick R.

1980-09-01

A 15.8-kilobase pair fragment of BALB/c mouse liver DNA, cloned in the Charon 4Aλ phage vector system, was shown to contain the μ heavy chain constant region (CHμ ) gene for the mouse immunoglobulin M. In addition, this fragment of DNA contains at least two J genes, used to code for the carboxyl terminal portion of heavy chain variable regions. These genes are located in genomic DNA about eight kilobase pairs to the 5' side of the CHμ gene. The complete nucleotide sequence of a 1120-base pair stretch of DNA that includes the two J genes has been determined.
Human somatostatin I: sequence of the cDNA.

PubMed Central

Shen, L P; Pictet, R L; Rutter, W J

1982-01-01

RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875
The complete mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae).

PubMed

Zhou, Xuming; Chen, Yu; Zhu, Shanliang; Xu, Haigen; Liu, Yan; Chen, Lian

2016-01-01

The mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae) is the first complete mtDNA sequence reported in the genus Pomacea. The total length of mtDNA is 15,707 bp, which containing 13 protein-coding genes, 2 ribosomal RNAs, 22 transfer RNAs, and a 359 bp non-coding region. The A + T content of the overall base composition of H-strand is 71.7% (T: 41%, C: 12.7%, A: 30.7%, G: 15.6%). ATP6, ATP8, CO1, CO2, ND1-3, ND5, ND6, ND4L and Cyt b genes begin with ATG as start codon, CO3 and ND4 begin with ATA. ATP8, CO2-3, ND4L, ND2-6 and Cyt b genes are terminated with TAA as stop codon, ATP6, ND1, and CO1 end with TAG. A long non-coding region is found and a 23 bp repeat unit repeat 11 times in this region.
Identification of three novel NHS mutations in families with Nance-Horan syndrome.

PubMed

Huang, Kristen M; Wu, Junhua; Brooks, Simon P; Hardcastle, Alison J; Lewis, Richard Alan; Stambolian, Dwight

2007-03-27

Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. We identified three unique NHS coding region mutations in these NHS families. This report extends the number of unique identified NHS mutations to 14.

The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

PubMed Central

Pietan, Lucas L.; Spradling, Theresa A.

2016-01-01

In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
The primary structures of two yeast enolase genes. Homology between the 5' noncoding flanking regions of yeast enolase and glyceraldehyde-3-phosphate dehydrogenase genes.

PubMed

Holland, M J; Holland, J P; Thill, G P; Jackson, K A

1981-02-10

Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5- noncoding portions of these glycolytic genes.
Complex alternative splicing of acetylcholinesterase transcripts in Torpedo electric organ; primary structure of the precursor of the glycolipid-anchored dimeric form.

PubMed Central

Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J

1988-01-01

In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125
The artificial zinc finger coding gene 'Jazz' binds the utrophin promoter and activates transcription.

PubMed

Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C

2000-06-01

Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

PubMed

Cao, Yinhe; Tung, Wen-Wen; Gao, J B

2004-01-01

With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Applications of statistical physics and information theory to the analysis of DNA sequences

NASA Astrophysics Data System (ADS)

Grosse, Ivo

2000-10-01

DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
Hypervariability of ribosomal DNA at multiple chromosomal sites in lake trout (Salvelinus namaycush).

PubMed

Zhuo, L; Reed, K M; Phillips, R B

1995-06-01

Variation in the intergenic spacer (IGS) of the ribosomal DNA (rDNA) of lake trout (Salvelinus namaycush) was examined. Digestion of genomic DNA with restriction enzymes showed that almost every individual had a unique combination of length variants with most of this variation occurring within rather than between populations. Sequence analysis of a 2.3 kilobase (kb) EcoRI-DraI fragment spanning the 3' end of the 28S coding region and approximately 1.8 kb of the IGS revealed two blocks of repetitive DNA. Putative transcriptional termination sites were found approximately 220 bases (b) downstream from the end of the 28S coding region. Comparison of the 2.3-kb fragments with two longer (3.1 kb) fragments showed that the major difference in length resulted from variation in the number of short (89 b) repeats located 3' to the putative terminator. Repeat units within a single nucleolus organizer region (NOR) appeared relatively homogeneous and genetic analysis found variants to be stably inherited. A comparison of the number of spacer-length variants with the number of NORs found that the number of length variants per individual was always less than the number of NORs. Examination of spacer variants in five populations showed that populations with more NORs had more spacer variants, indicating that variants are present at different rDNA sites on nonhomologous chromosomes.
Generating and repairing genetically programmed DNA breaks during immunoglobulin class switch recombination

PubMed Central

Nicolas, Laura; Cols, Montserrat; Choi, Jee Eun; Chaudhuri, Jayanta; Vuong, Bao

2018-01-01

Adaptive immune responses require the generation of a diverse repertoire of immunoglobulins (Igs) that can recognize and neutralize a seemingly infinite number of antigens. V(D)J recombination creates the primary Ig repertoire, which subsequently is modified by somatic hypermutation (SHM) and class switch recombination (CSR). SHM promotes Ig affinity maturation whereas CSR alters the effector function of the Ig. Both SHM and CSR require activation-induced cytidine deaminase (AID) to produce dU:dG mismatches in the Ig locus that are transformed into untemplated mutations in variable coding segments during SHM or DNA double-strand breaks (DSBs) in switch regions during CSR. Within the Ig locus, DNA repair pathways are diverted from their canonical role in maintaining genomic integrity to permit AID-directed mutation and deletion of gene coding segments. Recently identified proteins, genes, and regulatory networks have provided new insights into the temporally and spatially coordinated molecular interactions that control the formation and repair of DSBs within the Ig locus. Unravelling the genetic program that allows B cells to selectively alter the Ig coding regions while protecting non-Ig genes from DNA damage advances our understanding of the molecular processes that maintain genomic integrity as well as humoral immunity. PMID:29744038
Sequence-dependent modelling of local DNA bending phenomena: curvature prediction and vibrational analysis.

PubMed

Vlahovicek, K; Munteanu, M G; Pongor, S

1999-01-01

Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

PubMed

Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

2016-01-04

Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Identification of three novel NHS mutations in families with Nance-Horan syndrome

PubMed Central

Wu, Junhua; Brooks, Simon P.; Hardcastle, Alison J.; Lewis, Richard Alan; Stambolian, Dwight

2007-01-01

Purpose Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Methods Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. Results We identified three unique NHS coding region mutations in these NHS families. Conclusions This report extends the number of unique identified NHS mutations to 14. PMID:17417607
Characterization of mitochondrial genome of sea cucumber Stichopus horrens: a novel gene arrangement in Holothuroidea.

PubMed

Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing

2011-05-01

The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Potential efficacy of mitochondrial genes for animal DNA barcoding: a case study using eutherian mammals.

PubMed

Luo, Arong; Zhang, Aibing; Ho, Simon Yw; Xu, Weijun; Zhang, Yanzhou; Shi, Weifeng; Cameron, Stephen L; Zhu, Chaodong

2011-01-28

A well-informed choice of genetic locus is central to the efficacy of DNA barcoding. Current DNA barcoding in animals involves the use of the 5' half of the mitochondrial cytochrome oxidase 1 gene (CO1) to diagnose and delimit species. However, there is no compelling a priori reason for the exclusive focus on this region, and it has been shown that it performs poorly for certain animal groups. To explore alternative mitochondrial barcoding regions, we compared the efficacy of the universal CO1 barcoding region with the other mitochondrial protein-coding genes in eutherian mammals. Four criteria were used for this comparison: the number of recovered species, sequence variability within and between species, resolution to taxonomic levels above that of species, and the degree of mutational saturation. Based on 1,179 mitochondrial genomes of eutherians, we found that the universal CO1 barcoding region is a good representative of mitochondrial genes as a whole because the high species-recovery rate (> 90%) was similar to that of other mitochondrial genes, and there were no significant differences in intra- or interspecific variability among genes. However, an overlap between intra- and interspecific variability was still problematic for all mitochondrial genes. Our results also demonstrated that any choice of mitochondrial gene for DNA barcoding failed to offer significant resolution at higher taxonomic levels. We suggest that the CO1 barcoding region, the universal DNA barcode, is preferred among the mitochondrial protein-coding genes as a molecular diagnostic at least for eutherian species identification. Nevertheless, DNA barcoding with this marker may still be problematic for certain eutherian taxa and our approach can be used to test potential barcoding loci for such groups.
Potential efficacy of mitochondrial genes for animal DNA barcoding: a case study using eutherian mammals

PubMed Central

2011-01-01

Background A well-informed choice of genetic locus is central to the efficacy of DNA barcoding. Current DNA barcoding in animals involves the use of the 5' half of the mitochondrial cytochrome oxidase 1 gene (CO1) to diagnose and delimit species. However, there is no compelling a priori reason for the exclusive focus on this region, and it has been shown that it performs poorly for certain animal groups. To explore alternative mitochondrial barcoding regions, we compared the efficacy of the universal CO1 barcoding region with the other mitochondrial protein-coding genes in eutherian mammals. Four criteria were used for this comparison: the number of recovered species, sequence variability within and between species, resolution to taxonomic levels above that of species, and the degree of mutational saturation. Results Based on 1,179 mitochondrial genomes of eutherians, we found that the universal CO1 barcoding region is a good representative of mitochondrial genes as a whole because the high species-recovery rate (> 90%) was similar to that of other mitochondrial genes, and there were no significant differences in intra- or interspecific variability among genes. However, an overlap between intra- and interspecific variability was still problematic for all mitochondrial genes. Our results also demonstrated that any choice of mitochondrial gene for DNA barcoding failed to offer significant resolution at higher taxonomic levels. Conclusions We suggest that the CO1 barcoding region, the universal DNA barcode, is preferred among the mitochondrial protein-coding genes as a molecular diagnostic at least for eutherian species identification. Nevertheless, DNA barcoding with this marker may still be problematic for certain eutherian taxa and our approach can be used to test potential barcoding loci for such groups. PMID:21276253
Sequence of interleukin-2 isolated from human placental poly A+ RNA: possible role in maintenance of fetal allograft.

PubMed

Chernicky, C L; Tan, H; Burfeind, P; Ilan, J; Ilan, J

1996-02-01

There are several cell types within the placenta that produce cytokines which can contribute to the regulatory mechanisms that ensure normal pregnancy. The immunological milieu at the maternofetal interface is considered to be crucial for survival of the fetus. Interleukin-2 (IL-2) is expressed by the syncytiotrophoblast, the cell layer between the mother and the fetus. IL-2 appears to be a key factor in maintenance of pregnancy. Therefore, it was important to determine the sequence of human placental interleukin-2. Direct sequencing of human placental IL-2 cDNA was determined for the coding region. Subclone sequencing was carried out for the 5'- and 3'-untranslated regions (5'-UTR and 3'-UTR). The 5'-UTR for human placental IL-2 cDNA is 294 bp, which is 247 nucleotides longer than that reported for cDNA IL-2 derived from T cells. The sequence of the coding region is identical to that reported for T cell IL-2, while sequence analysis of the polymerase chain reaction (PCR) product showed that the cDNA from the 3' end was the same as that reported for cDNA from T cells. Human placental IL-2 cDNA is 1,028 base pairs (excluding the poly A tail), which is 247 bp longer at the 5' end than that reported for IL-2 T cell cDNA. Therefore, the extended 5'-UTR of the placental IL-2 cDNA may be a consequence of alternative promoter utilization in the placenta.
Cloning and expression of a cDNA coding for catalase from zebrafish (Danio rerio).

PubMed

Ken, C F; Lin, C T; Wu, J L; Shaw, J F

2000-06-01

A full-length complementary DNA (cDNA) clone encoding a catalase was amplified by the rapid amplication of cDNA ends-polymerase chain reaction (RACE-PCR) technique from zebrafish (Danio rerio) mRNA. Nucleotide sequence analysis of this cDNA clone revealed that it comprised a complete open reading frame coding for 526 amino acid residues and that it had a molecular mass of 59 654 Da. The deduced amino acid sequence showed high similarity with the sequences of catalase from swine (86.9%), mouse (85.8%), rat (85%), human (83.7%), fruit fly (75.6%), nematode (71.1%), and yeast (58.6%). The amino acid residues for secondary structures are apparently conserved as they are present in other mammal species. Furthermore, the coding region of zebrafish catalase was introduced into an expression vector, pET-20b(+), and transformed into Escherichia coli expression host BL21(DE3)pLysS. A 60-kDa active catalase protein was expressed and detected by Coomassie blue staining as well as activity staining on polyacrylamide gel followed electrophoresis.
Reconstitution of wild type viral DNA in simian cells transfected with early and late SV40 defective genomes.

PubMed

O'Neill, F J; Gao, Y; Xu, X

1993-11-01

The DNAs of polyomaviruses ordinarily exist as a single circular molecule of approximately 5000 base pairs. Variants of SV40, BKV and JCV have been described which contain two complementing defective DNA molecules. These defectives, which form a bipartite genome structure, contain either the viral early region or the late region. The defectives have the unique property of being able to tolerate variable sized reiterations of regulatory and terminus region sequences, and portions of the coding region. They can also exchange coding region sequences with other polyomaviruses. It has been suggested that the bipartite genome structure might be a stage in the evolution of polyomaviruses which can uniquely sustain genome and sequence diversity. However, it is not known if the regulatory and terminus region sequences are highly mutable. Also, it is not known if the bipartite genome structure is reversible and what the conditions might be which would favor restoration of the monomolecular genome structure. We addressed the first question by sequencing the reiterated regulatory and terminus regions of E- and L-SV40 DNAs. This revealed a large number of mutations in the regulatory regions of the defective genomes, including deletions, insertions, rearrangements and base substitutions. We also detected insertions and base substitutions in the T-antigen gene. We addressed the second question by introducing into permissive simian cells, E- and L-SV40 genomes which had been engineered to contain only a single regulatory region. Analysis of viral DNA from transfected cells demonstrated recombined genomes containing a wild type monomolecular DNA structure. However, the complete defectives, containing reiterated regulatory regions, could often compete away the wild type genomes. The recombinant monomolecular genomes were isolated, cloned and found to be infectious. All of the DNA alterations identified in one of the regulatory regions of E-SV40 DNA were present in the recombinant monomolecular genomes. These and other findings indicate that the bipartite genome state can sustain many mutations which wtSV40 cannot directly sustain. However, the mutations can later be introduced into the wild type genomes when the E- and L-SV40 DNAs recombine to generate a new monomolecular genome structure.
The complete chloroplast genome of Sinopodophyllum hexandrum Ying (Berberidaceae).

PubMed

Meng, Lihua; Liu, Ruijuan; Chen, Jianbing; Ding, Chenxu

2017-05-01

The complete nucleotide sequence of the Sinopodophyllum hexandrum Ying chloroplast genome (cpDNA) was determined based on next-generation sequencing technologies in this study. The genome was 157 203 bp in length, containing a pair of inverted repeat (IRa and IRb) regions of 25 960 bp, which were separated by a large single-copy (LSC) region of 87 065 bp and a small single-copy (SSC) region of 18 218 bp, respectively. The cpDNA contained 148 genes, including 96 protein-coding genes, 8 ribosomal RNA genes, and 44 tRNA genes. In these genes, eight harbored a single intron, and two (ycf3 and clpP) contained a couple of introns. The cpDNA AT content of S. hexandrum cpDNA is 61.5%.
A regional approach to plant DNA barcoding provides high species resolution of sedges (Carex and Kobresia, Cyperaceae) in the Canadian Arctic Archipelago.

PubMed

Clerc-Blain, Jessica L E; Starr, Julian R; Bull, Roger D; Saarela, Jeffery M

2010-01-01

Previous research on barcoding sedges (Carex) suggested that basic searches within a global barcoding database would probably not resolve more than 60% of the world's some 2000 species. In this study, we take an alternative approach and explore the performance of plant DNA barcoding in the Carex lineage from an explicitly regional perspective. We characterize the utility of a subset of the proposed protein-coding and noncoding plastid barcoding regions (matK, rpoB, rpoC1, rbcL, atpF-atpH, psbK-psbI) for distinguishing species of Carex and Kobresia in the Canadian Arctic Archipelago, a clearly defined eco-geographical region representing 1% of the Earth's landmass. Our results show that matK resolves the greatest number of species of any single-locus (95%), and when combined in a two-locus barcode, it provides 100% species resolution in all but one combination (matK + atpFH) during unweighted pair-group method with arithmetic mean averages (UPGMA) analyses. Noncoding regions were equally or more variable than matK, but as single markers they resolve substantially fewer taxa than matK alone. When difficulties with sequencing and alignment due to microstructural variation in noncoding regions are also considered, our results support other studies in suggesting that protein-coding regions are more practical as barcoding markers. Plastid DNA barcodes are an effective identification tool for species of Carex and Kobresia in the Canadian Arctic Archipelago, a region where the number of co-existing closely related species is limited. We suggest that if a regional approach to plant DNA barcoding was applied on a global scale, it could provide a solution to the generally poor species resolution seen in previous barcoding studies. © 2009 Blackwell Publishing Ltd.

Rapid Mitochondrial Genome Evolution through Invasion of Mobile Elements in Two Closely Related Species of Arbuscular Mycorrhizal Fungi

PubMed Central

Beaudet, Denis; Nadimi, Maryam; Iffis, Bachir; Hijri, Mohamed

2013-01-01

Arbuscular mycorrhizal fungi (AMF) are common and important plant symbionts. They have coenocytic hyphae and form multinucleated spores. The nuclear genome of AMF is polymorphic and its organization is not well understood, which makes the development of reliable molecular markers challenging. In stark contrast, their mitochondrial genome (mtDNA) is homogeneous. To assess the intra- and inter-specific mitochondrial variability in closely related Glomus species, we performed 454 sequencing on total genomic DNA of Glomus sp. isolate DAOM-229456 and we compared its mtDNA with two G. irregulare isolates. We found that the mtDNA of Glomus sp. is homogeneous, identical in gene order and, with respect to the sequences of coding regions, almost identical to G. irregulare. However, certain genomic regions vary substantially, due to insertions/deletions of elements such as introns, mitochondrial plasmid-like DNA polymerase genes and mobile open reading frames. We found no evidence of mitochondrial or cytoplasmic plasmids in Glomus species, and mobile ORFs in Glomus are responsible for the formation of four gene hybrids in atp6, atp9, cox2, and nad3, which are most probably the result of horizontal gene transfer and are expressed at the mRNA level. We found evidence for substantial sequence variation in defined regions of mtDNA, even among closely related isolates with otherwise identical coding gene sequences. This variation makes it possible to design reliable intra- and inter-specific markers. PMID:23637766
Rapid mitochondrial genome evolution through invasion of mobile elements in two closely related species of arbuscular mycorrhizal fungi.

PubMed

Beaudet, Denis; Nadimi, Maryam; Iffis, Bachir; Hijri, Mohamed

2013-01-01

Arbuscular mycorrhizal fungi (AMF) are common and important plant symbionts. They have coenocytic hyphae and form multinucleated spores. The nuclear genome of AMF is polymorphic and its organization is not well understood, which makes the development of reliable molecular markers challenging. In stark contrast, their mitochondrial genome (mtDNA) is homogeneous. To assess the intra- and inter-specific mitochondrial variability in closely related Glomus species, we performed 454 sequencing on total genomic DNA of Glomus sp. isolate DAOM-229456 and we compared its mtDNA with two G. irregulare isolates. We found that the mtDNA of Glomus sp. is homogeneous, identical in gene order and, with respect to the sequences of coding regions, almost identical to G. irregulare. However, certain genomic regions vary substantially, due to insertions/deletions of elements such as introns, mitochondrial plasmid-like DNA polymerase genes and mobile open reading frames. We found no evidence of mitochondrial or cytoplasmic plasmids in Glomus species, and mobile ORFs in Glomus are responsible for the formation of four gene hybrids in atp6, atp9, cox2, and nad3, which are most probably the result of horizontal gene transfer and are expressed at the mRNA level. We found evidence for substantial sequence variation in defined regions of mtDNA, even among closely related isolates with otherwise identical coding gene sequences. This variation makes it possible to design reliable intra- and inter-specific markers.
A family of long intergenic non-coding RNA genes in human chromosomal region 22q11.2 carry a DNA translocation breakpoint/AT-rich sequence

PubMed Central

2018-01-01

FAM230C, a long intergenic non-coding RNA (lincRNA) gene in human chromosome 13 (chr13) is a member of lincRNA genes termed family with sequence similarity 230. An analysis using bioinformatics search tools and alignment programs was undertaken to determine properties of FAM230C and its related genes. Results reveal that the DNA translocation element, the Translocation Breakpoint Type A (TBTA) sequence, which consists of satellite DNA, Alu elements, and AT-rich sequences is embedded in the FAM230C gene. Eight lincRNA genes related to FAM230C also carry the TBTA sequences. These genes were formed from a large segment of the 3’ half of the FAM230C sequence duplicated in chr22, and are specifically in regions of low copy repeats (LCR22)s, in or close to the 22q.11.2 region. 22q11.2 is a chromosomal segment that undergoes a high rate of DNA translocation and is prone to genetic deletions. FAM230C-related genes present in other chromosomes do not carry the TBTA motif and were formed from the 5’ half region of the FAM230C sequence. These findings identify a high specificity in lincRNA gene formation by gene sequence duplication in different chromosomes. PMID:29668722
VaDiR: an integrated approach to Variant Detection in RNA.

PubMed

Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

2018-02-01

Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Expression of the leukemia-associated CBF{beta}/SMMHC chimeric gene causes transformation of 3T3 cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hajra, A.; Liu, P.; Collins, E.S.

1994-09-01

A pericentric inversion of chromosome 16 (inv(16)(p13;q22)) is consistently seen in acute myeloid leukemia of the M4Eo subtype. This inversion fuses almost the entire coding region of the gene encoding of the {beta} subunit of the heterodimeric transcription factor CBF/PEBP2 to the region of the MYH11 gene encoding the rod domain for the smooth muscle myosin heavy chain (SMMHC). To investigate the biological properties of the CBF{beta}/SMMHC fusion protein, we have generated 3T3 cell lines that stably express the CBF{beta}/SMMHC chimeric cDNA or the normal, nonchimeric CBF{beta} and SMMHC cDNAs. 3T3 cells expressing CBF{beta}/SMMHC acquire a transformed phenotype, as indicatedmore » by altered cell morphology, formation of foci, and growth in soft agar. Cells constitutively overexpressing the normal CBF{beta} cDNA or the rod region of SMMHC remain nontransformed. Western blot analysis using antibodies to CBF{beta} and the SMMHC rod demonstrates that stably transfected cells express the appropriate chimeric or normal protein. Electrophoretic mobility shift assays reveal that cells transformed by the chimeric cDNA do not have a CBF-DNA complex of the expected mobility, but instead contain a large complex with CBF DNA-binding activity that fails to migrate out of the gel wells. In order to define the regions of CBF{beta}/SMMHC necessary for 3T3 transformation, we have stably transfected cells with mutant CBF{beta}/SMMHC cDNAs containing various deletions of the coding region. Analysis of these cell lines indicates that the transformation property of CBF{beta}/SMMHC requires regions of CBF{beta} known to be necessary for association with the DNA-binding CBF{alpha} subunit, and also requires an intact SMMHC carboxyl terminus, which is necessary for formation of the coiled coil domain of the myosin rod.« less
RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

PubMed Central

Wright, Imogen A.; Travers, Simon A.

2014-01-01

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
The human serotonin 5-HT{sub 2C} receptor: Complete cDNA, genomic structure, and alternatively spliced variant

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, Enzhong; Zhu, Lingyu; Zhao, Lingyun

1996-08-01

The complete 4775-nt cDNA encoding the human serotonin 5-HT{sub 2C} receptor (5-HT{sub 2C}R), a G-protein-coupled receptor, has been isolated. It contains a 1377-nt coding region flanked by a 728-nt 5{prime}-untranslated region and a 2670-nt 3{prime}-untranslated region. By using the cloned 5-HT{sub 2C}R cDNA probe, the complete human gene for this receptor has been isolated and shown to contain six exons and five introns spanning at least 230 kb of DNA. The coding region of the human 5-HT{sub 2C}R gene is interrupted by three introns, and the positions of the intron/exon junctions are conserved between the human and the rodent genes.more » In addition, an alternatively spliced 5-HT{sub 2C}R RNA that contains a 95-nt deletion in the region coding for the second intracellular loop and the fourth transmembrane domain of the receptor has been identified. This deletion leads to a frameshift and premature termination so that the short isoform RNA encodes a putative protein of 248 amino acids. The ratio for the short isoform over the 5-HT{sub 2C}R RNA was found to be higher in choroid plexus tumor than in normal brain tissue, suggesting the possibility of differential regulation of the 5-HT{sub 2C}R gene in different neural tissues or during tumorigenesis. Transcription of the human 5-HT{sub 2C}R gene was found to be initiated at multiple sites. No classical TATA-box sequence was found at the appropriate location, and the 5{prime}-flanking sequence contains many potential transcription factor-binding sites. A 7.3-kb 5{prime}-flanking 5-HT{sub 2C}R DNA directed the efficient expression of a luciferase reported gene in SK-N-SH and IMR32 neuroblastoma cells, indicating that is contains a functional promoter. 69 refs., 8 figs., 1 tab.« less
Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse

PubMed Central

Kortschak, R. Daniel

2018-01-01

The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183
The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.

PubMed

Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L

2003-11-01

Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.
Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

PubMed

Yin, Changchuan

2015-04-01

To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Non-coding RNA generated following lariat-debranching mediates targeting of AID to DNA

PubMed Central

Zheng, Simin; Vuong, Bao Q.; Vaidyanathan, Bharat; Lin, Jia-Yu; Huang, Feng-Ting; Chaudhuri, Jayanta

2015-01-01

SUMMARY Transcription through immunoglobulin switch (S) regions is essential for class switch recombination (CSR) but no molecular function of the transcripts has been described. Likewise, recruitment of activation-induced cytidine deaminase (AID) to S regions is critical for CSR; however, the underlying mechanism has not been fully elucidated. Here, we demonstrate that intronic switch RNA acts in trans to target AID to S region DNA. AID binds directly to switch RNA through G-quadruplexes formed by the RNA molecules. Disruption of this interaction by mutation of a key residue in the putative RNA-binding domain of AID impairs recruitment of AID to S region DNA, thereby abolishing CSR. Additionally, inhibition of RNA lariat processing leads to loss of AID localization to S regions and compromises CSR; both defects can be rescued by exogenous expression of switch transcripts in a sequence-specific manner. These studies uncover an RNA-mediated mechanism of targeting AID to DNA. PMID:25957684
Complete Sequence of the mitochondrial genome of the tapeworm Hymenolepis diminuta: Gene arrangements indicate that platyhelminths are eutrochozoans

DOE Office of Scientific and Technical Information (OSTI.GOV)

von Nickisch-Rosenegk, Markus; Brown, Wesley M.; Boore, Jeffrey L.

2001-01-01

Using ''long-PCR'' we have amplified in overlapping fragments the complete mitochondrial genome of the tapeworm Hymenolepis diminuta (Platyhelminthes: Cestoda) and determined its 13,900 nucleotide sequence. The gene content is the same as that typically found for animal mitochondrial DNA (mtDNA) except that atp8 appears to be lacking, a condition found previously for several other animals. Despite the small size of this mtDNA, there are two large non-coding regions, one of which contains 13 repeats of a 31 nucleotide sequence and a potential stem-loop structure of 25 base pairs with an 11-member loop. Large potential secondary structures are identified also formore » the non-coding regions of two other cestode mtDNAs. Comparison of the mitochondrial gene arrangement of H. diminuta with those previously published supports a phylogenetic position of flatworms as members of the Eutrochozoa, rather than being basal to either a clade of protostomes or a clade of coelomates.« less
Long non-coding RNA produced by RNA polymerase V determines boundaries of heterochromatin

PubMed Central

Böhmdorfer, Gudrun; Sethuraman, Shriya; Rowley, M Jordan; Krzyszton, Michal; Rothi, M Hafiz; Bouzit, Lilia; Wierzbicki, Andrzej T

2016-01-01

RNA-mediated transcriptional gene silencing is a conserved process where small RNAs target transposons and other sequences for repression by establishing chromatin modifications. A central element of this process are long non-coding RNAs (lncRNA), which in Arabidopsis thaliana are produced by a specialized RNA polymerase known as Pol V. Here we show that non-coding transcription by Pol V is controlled by preexisting chromatin modifications located within the transcribed regions. Most Pol V transcripts are associated with AGO4 but are not sliced by AGO4. Pol V-dependent DNA methylation is established on both strands of DNA and is tightly restricted to Pol V-transcribed regions. This indicates that chromatin modifications are established in close proximity to Pol V. Finally, Pol V transcription is preferentially enriched on edges of silenced transposable elements, where Pol V transcribes into TEs. We propose that Pol V may play an important role in the determination of heterochromatin boundaries. DOI: http://dx.doi.org/10.7554/eLife.19092.001 PMID:27779094
Intact coding region of the serotonin transporter gene in obsessive-compulsive disorder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Altemus, M.; Murphy, D.L.; Greenberg, B.

1996-07-26

Epidemiologic studies indicate that obsessive-compulsive disorder is genetically transmitted in some families, although no genetic abnormalities have been identified in individuals with this disorder. The selective response of obsessive-compulsive disorder to treatment with agents which block serotonin reuptake suggests the gene coding for the serotonin transporter as a candidate gene. The primary structure of the serotonin-transporter coding region was sequenced in 22 patients with obsessive-compulsive disorder, using direct PCR sequencing of cDNA synthesized from platelet serotonin-transporter mRNA. No variations in amino acid sequence were found among the obsessive-compulsive disorder patients or healthy controls. These results do not support a rolemore » for alteration in the primary structure of the coding region of the serotonin-transporter gene in the pathogenesis of obsessive-compulsive disorder. 27 refs.« less
Isolation and sequence of partial cDNA clones of human L1: homology of human and rodent L1 in the cytoplasmic region.

PubMed

Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B

1991-03-01

We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.
Evolution in the block: common elements of 5S rDNA organization and evolutionary patterns in distant fish genera.

PubMed

Campo, Daniel; García-Vázquez, Eva

2012-01-01

The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).
Evaluation of the phospholamban gene in purebred large-breed dogs with dilated cardiomyopathy.

PubMed

Stabej, Polona; Leegwater, Peter A; Stokhof, Arnold A; Domanjko-Petric, Aleksandra; van Oost, Bernard A

2005-03-01

To evaluate the role of the phospholamban gene in purebred large-breed dogs with dilated cardiomyopathy (DCM). 6 dogs with DCM, including 2 Doberman Pinschers, 2 Newfoundlands, and 2 Great Danes. All dogs had clinical signs of congestive heart failure, and a diagnosis of DCM was made on the basis of echocardiographic findings. Blood samples were collected from each dog, and genomic DNA was isolated by a salt extraction method. Specific oligonucleotides were designed to amplify the promoter, exon 1, the 5'-part of exon 2 including the complete coding region, and part of intron 1 of the canine phospholamban gene via polymerase chain reaction procedures. These regions were screened for mutations in DNA obtained from the 6 dogs with DCM. No mutations were identified in the promoter, 5' untranslated region, part of intron 1, part of the 3' untranslated region, and the complete coding region of the phospholamban gene in dogs with DCM. Results indicate that mutations in the phospholamban gene are not a frequent cause of DCM in Doberman Pinschers, Newfoundlands, and Great Danes.
Acinetobacter phage genome is similar to Sphinx 2.36, the circular DNA copurified with TSE infected particles.

PubMed

Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda

2013-01-01

While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002.
Acinetobacter phage genome is similar to Sphinx 2.36, the circular DNA copurified with TSE infected particles

PubMed Central

Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda

2013-01-01

While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002. PMID:23867905
Non-coding-regulatory regions of human brain genes delineated by bacterial artificial chromosome knock-in mice.

PubMed

Schmouth, Jean-François; Castellarin, Mauro; Laprise, Stéphanie; Banks, Kathleen G; Bonaguro, Russell J; McInerny, Simone C; Borretta, Lisa; Amirabbasi, Mahsa; Korecki, Andrea J; Portales-Casamar, Elodie; Wilson, Gary; Dreolini, Lisa; Jones, Steven J M; Wasserman, Wyeth W; Goldowitz, Daniel; Holt, Robert A; Simpson, Elizabeth M

2013-10-14

The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX ('high-throughput human genes on the X chromosome') strategy to expand our understanding of human gene regulation in vivo. In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression.

Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.

PubMed Central

Sasaki, H; Yokoyama, E; Kuroiwa, A

1990-01-01

The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Capturing the Biofuel Wellhead and Powerhouse: The Chloroplast and Mitochondrial Genomes of the Leguminous Feedstock Tree Pongamia pinnata

PubMed Central

Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.

2012-01-01

Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Capturing the biofuel wellhead and powerhouse: the chloroplast and mitochondrial genomes of the leguminous feedstock tree Pongamia pinnata.

PubMed

Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M

2012-01-01

Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
The mitochondrial genome of Moniliophthora roreri, the frosty pod rot pathogen of cacao.

PubMed

Costa, Gustavo G L; Cabrera, Odalys G; Tiburcio, Ricardo A; Medrano, Francisco J; Carazzolle, Marcelo F; Thomazella, Daniela P T; Schuster, Stephen C; Carlson, John E; Guiltinan, Mark J; Bailey, Bryan A; Mieczkowski, Piotr; Pereira, Gonçalo A G; Meinhardt, Lyndel W

2012-05-01

In this study, we report the sequence of the mitochondrial (mt) genome of the Basidiomycete fungus Moniliophthora roreri, which is the etiologic agent of frosty pod rot of cacao (Theobroma cacao L.). We also compare it to the mtDNA from the closely-related species Moniliophthora perniciosa, which causes witches' broom disease of cacao. The 94 Kb mtDNA genome of M. roreri has a circular topology and codes for the typical 14 mt genes involved in oxidative phosphorylation. It also codes for both rRNA genes, a ribosomal protein subunit, 13 intronic open reading frames (ORFs), and a full complement of 27 tRNA genes. The conserved genes of M. roreri mtDNA are completely syntenic with homologous genes of the 109 Kb mtDNA of M. perniciosa. As in M. perniciosa, M. roreri mtDNA contains a high number of hypothetical ORFs (28), a remarkable feature that make Moniliophthoras the largest reservoir of hypothetical ORFs among sequenced fungal mtDNA. Additionally, the mt genome of M. roreri has three free invertron-like linear mt plasmids, one of which is very similar to that previously described as integrated into the main M. perniciosa mtDNA molecule. Moniliophthora roreri mtDNA also has a region of suspected plasmid origin containing 15 hypothetical ORFs distributed in both strands. One of these ORFs is similar to an ORF in the mtDNA gene encoding DNA polymerase in Pleurotus ostreatus. The comparison to M. perniciosa showed that the 15 Kb difference in mtDNA sizes is mainly attributed to a lower abundance of repetitive regions in M. roreri (5.8 Kb vs 20.7 Kb). The most notable differences between M. roreri and M. perniciosa mtDNA are attributed to repeats and regions of plasmid origin. These elements might have contributed to the rapid evolution of mtDNA. Since M. roreri is the second species of the genus Moniliophthora whose mtDNA genome has been sequenced, the data presented here contribute valuable information for understanding the evolution of fungal mt genomes among closely-related species. Crown Copyright © 2012. Published by Elsevier Ltd. All rights reserved.
Artificial Intelligence, DNA Mimicry, and Human Health.

PubMed

Stefano, George B; Kream, Richard M

2017-08-14

The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.
Cloning of the cDNA for U1 small nuclear ribonucleoprotein particle 70K protein from Arabidopsis thaliana

NASA Technical Reports Server (NTRS)

Reddy, A. S.; Czernik, A. J.; An, G.; Poovaiah, B. W.

1992-01-01

We cloned and sequenced a plant cDNA that encodes U1 small nuclear ribonucleoprotein (snRNP) 70K protein. The plant U1 snRNP 70K protein cDNA is not full length and lacks the coding region for 68 amino acids in the amino-terminal region as compared to human U1 snRNP 70K protein. Comparison of the deduced amino acid sequence of the plant U1 snRNP 70K protein with the amino acid sequence of animal and yeast U1 snRNP 70K protein showed a high degree of homology. The plant U1 snRNP 70K protein is more closely related to the human counter part than to the yeast 70K protein. The carboxy-terminal half is less well conserved but, like the vertebrate 70K proteins, is rich in charged amino acids. Northern analysis with the RNA isolated from different parts of the plant indicates that the snRNP 70K gene is expressed in all of the parts tested. Southern blotting of genomic DNA using the cDNA indicates that the U1 snRNP 70K protein is coded by a single gene.
Analysis of 16S-23S rRNA intergenic spacer regions of Vibrio cholerae and Vibrio mimicus.

PubMed

Chun, J; Huq, A; Colwell, R R

1999-05-01

Vibrio cholerae identification based on molecular sequence data has been hampered by a lack of sequence variation from the closely related Vibrio mimicus. The two species share many genes coding for proteins, such as ctxAB, and show almost identical 16S DNA coding for rRNA (rDNA) sequences. Primers targeting conserved sequences flanking the 3' end of the 16S and the 5' end of the 23S rDNAs were used to amplify the 16S-23S rRNA intergenic spacer regions of V. cholerae and V. mimicus. Two major (ca. 580 and 500 bp) and one minor (ca. 750 bp) amplicons were consistently generated for both species, and their sequences were determined. The largest fragment contains three tRNA genes (tDNAs) coding for tRNAGlu, tRNALys, and tRNAVal, which has not previously been found in bacteria examined to date. The 580-bp amplicon contained tDNAIle and tDNAAla, whereas the 500-bp fragment had single tDNA coding either tRNAGlu or tRNAAla. Little variation, i.e., 0 to 0.4%, was found among V. cholerae O1 classical, O1 El Tor, and O139 epidemic strains. Slightly more variation was found against the non-O1/non-O139 serotypes (ca. 1% difference) and V. mimicus (2 to 3% difference). A pair of oligonucleotide primers were designed, based on the region differentiating all of V. cholerae strains from V. mimicus. The PCR system developed was subsequently evaluated by using representatives of V. cholerae from environmental and clinical sources, and of other taxa, including V. mimicus. This study provides the first molecular tool for identifying the species V. cholerae.
Isolation and sequencing of the gene encoding Sp23, a structural protein of spermatophore of the mealworm beetle, Tenebrio molitor.

PubMed

Feng, X; Happ, G M

1996-11-14

The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
Both V(D)J Coding Ends but Neither Signal End Can Recombine at the bcl-2 Major Breakpoint Region, and the Rejoining Is Ligase IV Dependent

PubMed Central

Raghavan, Sathees C.; Hsieh, Chih-Lin; Lieber, Michael R.

2005-01-01

The t(14;18) chromosomal translocation is the most common translocation in human cancer, and it occurs in all follicular lymphomas. The 150-bp bcl-2 major breakpoint region (Mbr) on chromosome 18 is a fragile site, because it adopts a non-B DNA conformation that can be cleaved by the RAG complex. The non-B DNA structure and the chromosomal translocation can be recapitulated on intracellular human minichromosomes where immunoglobulin 12- and 23-signals are positioned downstream of the bcl-2 Mbr. Here we show that either of the two coding ends in these V(D)J recombination reactions can recombine with either of the two broken ends of the bcl-2 Mbr but that neither signal end can recombine with the Mbr. Moreover, we show that the rejoining is fully dependent on DNA ligase IV, indicating that the rejoining phase relies on the nonhomologous DNA end-joining pathway. These results permit us to formulate a complete model for the order and types of cleavage and rejoining events in the t(14;18) translocation. PMID:16024785
[Structural organization of 5S ribosomal DNA of Rosa rugosa].

PubMed

Tynkevych, Iu O; Volkov, R A

2014-01-01

In order to clarify molecular organization of the genomic region encoding 5S rRNA in diploid species Rosa rugosa several 5S rDNA repeated units were cloned and sequenced. Analysis of the obtained sequences revealed that only one length variant of 5S rDNA repeated units, which contains intact promoter elements in the intergenic spacer region (IGS) and appears to be transcriptionally active is present in the genome. Additionally, a limited number of 5S rDNA pseudogenes lacking a portion of coding sequence and the complete IGS was detected. A high level of sequence similarity (from 93.7 to 97.5%) between the IGS of major 5S rDNA variants of East Asian R. rugosa and North American R. nitida was found indicating comparatively recent divergence of these species.
RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

PubMed

Wright, Imogen A; Travers, Simon A

2014-07-01

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Different domains of the murine RNA polymerase I-specific termination factor mTTF-I serve distinct functions in transcription termination.

PubMed

Evers, R; Smid, A; Rudloff, U; Lottspeich, F; Grummt, I

1995-03-15

Termination of mouse ribosomal gene transcription by RNA polymerase I (Pol I) requires the specific interaction of a DNA binding protein, mTTF-I, with an 18 bp sequence element located downstream of the rRNA coding region. Here we describe the molecular cloning and functional characterization of the cDNA encoding this transcription termination factor. Recombinant mTTF-I binds specifically to the murine terminator elements and terminates Pol I transcription in a reconstituted in vitro system. Deletion analysis has defined a modular structure of mTTF-I comprising a dispensable N-terminal half, a large C-terminal DNA binding region and an internal domain which is required for transcription termination. Significantly, the C-terminal region of mTTF-I reveals striking homology to the DNA binding domains of the proto-oncogene c-Myb and the yeast transcription factor Reb1p. Site-directed mutagenesis of one of the tryptophan residues that is conserved in the homology region of c-Myb, Reb1p and mTTF-I abolishes specific DNA binding, a finding which underscores the functional relevance of these residues in DNA-protein interactions.
Different domains of the murine RNA polymerase I-specific termination factor mTTF-I serve distinct functions in transcription termination.

PubMed Central

Evers, R; Smid, A; Rudloff, U; Lottspeich, F; Grummt, I

1995-01-01

Termination of mouse ribosomal gene transcription by RNA polymerase I (Pol I) requires the specific interaction of a DNA binding protein, mTTF-I, with an 18 bp sequence element located downstream of the rRNA coding region. Here we describe the molecular cloning and functional characterization of the cDNA encoding this transcription termination factor. Recombinant mTTF-I binds specifically to the murine terminator elements and terminates Pol I transcription in a reconstituted in vitro system. Deletion analysis has defined a modular structure of mTTF-I comprising a dispensable N-terminal half, a large C-terminal DNA binding region and an internal domain which is required for transcription termination. Significantly, the C-terminal region of mTTF-I reveals striking homology to the DNA binding domains of the proto-oncogene c-Myb and the yeast transcription factor Reb1p. Site-directed mutagenesis of one of the tryptophan residues that is conserved in the homology region of c-Myb, Reb1p and mTTF-I abolishes specific DNA binding, a finding which underscores the functional relevance of these residues in DNA-protein interactions. Images PMID:7720715
Gene Expression and Polymorphism of Myostatin Gene and its Association with Growth Traits in Chicken.

PubMed

Dushyanth, K; Bhattacharya, T K; Shukla, R; Chatterjee, R N; Sitaramamma, T; Paswan, C; Guru Vishnu, P

2016-10-01

Myostatin is a member of TGF-β super family and is directly involved in regulation of body growth through limiting muscular growth. A study was carried out in three chicken lines to identify the polymorphism in the coding region of the myostatin gene through SSCP and DNA sequencing. A total of 12 haplotypes were observed in myostatin coding region of chicken. Significant associations between haplogroups with body weight at day 1, 14, 28, and 42 days, and carcass traits at 42 days were observed across the lines. It is concluded that the coding region of myostatin gene was polymorphic, with varied levels of expression among lines and had significant effects on growth traits. The expression of MSTN gene varied during embryonic and post hatch development stage.
Vacuolar H[sup +]-ATPase 69-kilodalton catalytic subunit cDNA from developing cotton (Gossypium hirsutum) ovules

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilkins, T.A.

1993-06-01

This study investigates the molecular events of vacuole ontogeny in rapidly elongated cotton plant cells. Within the DNA coding region, the cotton and carrot cDNA clones exhibit 82.2% nucleotide sequence homology; at the amino acid level cotton and carrot catalytic subunits exhibited 95.7% identity and 2.1% amino acid similarity. When aligned with the analogous sequences from yeast, the cotton protein shared only 60.5% amino acid identity and 12.7% similarity. 10 refs., 1 tab.
Identification of common, unique and polymorphic microsatellites among 73 cyanobacterial genomes.

PubMed

Kabra, Ritika; Kapil, Aditi; Attarwala, Kherunnisa; Rai, Piyush Kant; Shanker, Asheesh

2016-04-01

Microsatellites also known as Simple Sequence Repeats are short tandem repeats of 1-6 nucleotides. These repeats are found in coding as well as non-coding regions of both prokaryotic and eukaryotic genomes and play a significant role in the study of gene regulation, genetic mapping, DNA fingerprinting and evolutionary studies. The availability of 73 complete genome sequences of cyanobacteria enabled us to mine and statistically analyze microsatellites in these genomes. The cyanobacterial microsatellites identified through bioinformatics analysis were stored in a user-friendly database named CyanoSat, which is an efficient data representation and query system designed using ASP.net. The information in CyanoSat comprises of perfect, imperfect and compound microsatellites found in coding, non-coding and coding-non-coding regions. Moreover, it contains PCR primers with 200 nucleotides long flanking region. The mined cyanobacterial microsatellites can be freely accessed at www.compubio.in/CyanoSat/home.aspx. In addition to this 82 polymorphic, 13,866 unique and 2390 common microsatellites were also detected. These microsatellites will be useful in strain identification and genetic diversity studies of cyanobacteria.
The coding region of the UFGT gene is a source of diagnostic SNP markers that allow single-locus DNA genotyping for the assessment of cultivar identity and ancestry in grapevine (Vitis vinifera L.)

PubMed Central

2013-01-01

Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902
Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi

PubMed Central

Schoch, Conrad L.; Seifert, Keith A.; Huhndorf, Sabine; Robert, Vincent; Spouge, John L.; Levesque, C. André; Chen, Wen; Bolchacova, Elena; Voigt, Kerstin; Crous, Pedro W.; Miller, Andrew N.; Wingfield, Michael J.; Aime, M. Catherine; An, Kwang-Deuk; Bai, Feng-Yan; Barreto, Robert W.; Begerow, Dominik; Bergeron, Marie-Josée; Blackwell, Meredith; Boekhout, Teun; Bogale, Mesfin; Boonyuen, Nattawut; Burgaz, Ana R.; Buyck, Bart; Cai, Lei; Cai, Qing; Cardinali, G.; Chaverri, Priscila; Coppins, Brian J.; Crespo, Ana; Cubas, Paloma; Cummings, Craig; Damm, Ulrike; de Beer, Z. Wilhelm; de Hoog, G. Sybren; Del-Prado, Ruth; Dentinger, Bryn; Diéguez-Uribeondo, Javier; Divakar, Pradeep K.; Douglas, Brian; Dueñas, Margarita; Duong, Tuan A.; Eberhardt, Ursula; Edwards, Joan E.; Elshahed, Mostafa S.; Fliegerova, Katerina; Furtado, Manohar; García, Miguel A.; Ge, Zai-Wei; Griffith, Gareth W.; Griffiths, K.; Groenewald, Johannes Z.; Groenewald, Marizeth; Grube, Martin; Gryzenhout, Marieka; Guo, Liang-Dong; Hagen, Ferry; Hambleton, Sarah; Hamelin, Richard C.; Hansen, Karen; Harrold, Paul; Heller, Gregory; Herrera, Cesar; Hirayama, Kazuyuki; Hirooka, Yuuri; Ho, Hsiao-Man; Hoffmann, Kerstin; Hofstetter, Valérie; Högnabba, Filip; Hollingsworth, Peter M.; Hong, Seung-Beom; Hosaka, Kentaro; Houbraken, Jos; Hughes, Karen; Huhtinen, Seppo; Hyde, Kevin D.; James, Timothy; Johnson, Eric M.; Johnson, Joan E.; Johnston, Peter R.; Jones, E.B. Gareth; Kelly, Laura J.; Kirk, Paul M.; Knapp, Dániel G.; Kõljalg, Urmas; Kovács, Gábor M.; Kurtzman, Cletus P.; Landvik, Sara; Leavitt, Steven D.; Liggenstoffer, Audra S.; Liimatainen, Kare; Lombard, Lorenzo; Luangsa-ard, J. Jennifer; Lumbsch, H. Thorsten; Maganti, Harinad; Maharachchikumbura, Sajeewa S. N.; Martin, María P.; May, Tom W.; McTaggart, Alistair R.; Methven, Andrew S.; Meyer, Wieland; Moncalvo, Jean-Marc; Mongkolsamrit, Suchada; Nagy, László G.; Nilsson, R. Henrik; Niskanen, Tuula; Nyilasi, Ildikó; Okada, Gen; Okane, Izumi; Olariaga, Ibai; Otte, Jürgen; Papp, Tamás; Park, Duckchul; Petkovits, Tamás; Pino-Bodas, Raquel; Quaedvlieg, William; Raja, Huzefa A.; Redecker, Dirk; Rintoul, Tara L.; Ruibal, Constantino; Sarmiento-Ramírez, Jullie M.; Schmitt, Imke; Schüßler, Arthur; Shearer, Carol; Sotome, Kozue; Stefani, Franck O.P.; Stenroos, Soili; Stielow, Benjamin; Stockinger, Herbert; Suetrong, Satinee; Suh, Sung-Oui; Sung, Gi-Ho; Suzuki, Motofumi; Tanaka, Kazuaki; Tedersoo, Leho; Telleria, M. Teresa; Tretter, Eric; Untereiner, Wendy A.; Urbina, Hector; Vágvölgyi, Csaba; Vialle, Agathe; Vu, Thuy Duong; Walther, Grit; Wang, Qi-Ming; Wang, Yan; Weir, Bevan S.; Weiß, Michael; White, Merlin M.; Xu, Jianping; Yahr, Rebecca; Yang, Zhu L.; Yurkov, Andrey; Zamora, Juan-Carlos; Zhang, Ning; Zhuang, Wen-Ying; Schindel, David

2012-01-01

Six DNA regions were evaluated as potential DNA barcodes for Fungi, the second largest kingdom of eukaryotic life, by a multinational, multilaboratory consortium. The region of the mitochondrial cytochrome c oxidase subunit 1 used as the animal barcode was excluded as a potential marker, because it is difficult to amplify in fungi, often includes large introns, and can be insufficiently variable. Three subunits from the nuclear ribosomal RNA cistron were compared together with regions of three representative protein-coding genes (largest subunit of RNA polymerase II, second largest subunit of RNA polymerase II, and minichromosome maintenance protein). Although the protein-coding gene regions often had a higher percent of correct identification compared with ribosomal markers, low PCR amplification and sequencing success eliminated them as candidates for a universal fungal barcode. Among the regions of the ribosomal cistron, the internal transcribed spacer (ITS) region has the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation. The nuclear ribosomal large subunit, a popular phylogenetic marker in certain groups, had superior species resolution in some taxonomic groups, such as the early diverging lineages and the ascomycete yeasts, but was otherwise slightly inferior to the ITS. The nuclear ribosomal small subunit has poor species-level resolution in fungi. ITS will be formally proposed for adoption as the primary fungal barcode marker to the Consortium for the Barcode of Life, with the possibility that supplementary barcodes may be developed for particular narrowly circumscribed taxonomic groups. PMID:22454494
Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi.

PubMed

Schoch, Conrad L; Seifert, Keith A; Huhndorf, Sabine; Robert, Vincent; Spouge, John L; Levesque, C André; Chen, Wen

2012-04-17

Six DNA regions were evaluated as potential DNA barcodes for Fungi, the second largest kingdom of eukaryotic life, by a multinational, multilaboratory consortium. The region of the mitochondrial cytochrome c oxidase subunit 1 used as the animal barcode was excluded as a potential marker, because it is difficult to amplify in fungi, often includes large introns, and can be insufficiently variable. Three subunits from the nuclear ribosomal RNA cistron were compared together with regions of three representative protein-coding genes (largest subunit of RNA polymerase II, second largest subunit of RNA polymerase II, and minichromosome maintenance protein). Although the protein-coding gene regions often had a higher percent of correct identification compared with ribosomal markers, low PCR amplification and sequencing success eliminated them as candidates for a universal fungal barcode. Among the regions of the ribosomal cistron, the internal transcribed spacer (ITS) region has the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation. The nuclear ribosomal large subunit, a popular phylogenetic marker in certain groups, had superior species resolution in some taxonomic groups, such as the early diverging lineages and the ascomycete yeasts, but was otherwise slightly inferior to the ITS. The nuclear ribosomal small subunit has poor species-level resolution in fungi. ITS will be formally proposed for adoption as the primary fungal barcode marker to the Consortium for the Barcode of Life, with the possibility that supplementary barcodes may be developed for particular narrowly circumscribed taxonomic groups.
Dynamic association of epigenetic H3K4me3 and DNA 5hmC marks in the dorsal hippocampus and anterior cingulate cortex following reactivation of a fear memory.

PubMed

Webb, William M; Sanchez, Richard G; Perez, Gabriella; Butler, Anderson A; Hauser, Rebecca M; Rich, Megan C; O'Bierne, Aidan L; Jarome, Timothy J; Lubin, Farah D

2017-07-01

Epigenetic mechanisms such as DNA methylation and histone methylation are critical regulators of gene transcription changes during memory consolidation. However, it is unknown how these epigenetic modifications coordinate control of gene expression following reactivation of a previously consolidated memory. Here, we found that retrieval of a recent contextual fear conditioned memory increased global levels of H3 lysine 4-trimethylation (H3K4me3) and DNA 5-hydroxymethylation (5hmC) in area CA1 of the dorsal hippocampus. Further experiments revealed increased levels of H3K4me3 and DNA 5hmC within a CpG-enriched coding region of the Npas4, but not c-fos, gene. Intriguingly, retrieval of a 30-day old memory increased H3K4me3 and DNA 5hmC levels at a CpG-enriched coding region of c-fos, but not Npas4, in the anterior cingulate cortex, suggesting that while these two epigenetic mechanisms co-occur following the retrieval of a recent or remote memory, their gene targets differ depending on the brain region. Additionally, we found that in vivo siRNA-mediated knockdown of the H3K4me3 methyltransferase Mll1 in CA1 abolished retrieval-induced increases in DNA 5hmC levels at the Npas4 gene, suggesting that H3K4me3 couples to DNA 5hmC mechanisms. Consistent with this, loss of Mll1 prevented retrieval-induced increases in Npas4 mRNA levels in CA1 and impaired fear memory. Collectively, these findings suggest an important link between histone methylation and DNA hydroxymethylation mechanisms in the epigenetic control of de novo gene transcription triggered by memory retrieval. Copyright © 2017 Elsevier Inc. All rights reserved.

Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

PubMed

Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

2017-01-03

Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
Sequence Polishing Library (SPL) v10.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oberortner, Ernst

The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

PubMed

Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

2018-05-31

In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
A deep learning method for lincRNA detection using auto-encoder algorithm.

PubMed

Yu, Ning; Yu, Zeng; Pan, Yi

2017-12-06

RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Multiple copies of a bile acid-inducible gene in Eubacterium sp. strain VPI 12708.

PubMed Central

Gopal-Srivastava, R; Mallonee, D H; White, W B; Hylemon, P B

1990-01-01

Eubacterium sp. strain VPI 12708 is an anaerobic intestinal bacterium which possesses inducible bile acid 7-dehydroxylation activity. Several new polypeptides are produced in this strain following induction with cholic acid. Genes coding for two copies of a bile acid-inducible 27,000-dalton polypeptide (baiA1 and baiA2) have been previously cloned and sequenced. We now report on a gene coding for a third copy of this 27,000-dalton polypeptide (baiA3). The baiA3 gene has been cloned in lambda DASH on an 11.2-kilobase DNA fragment from a partial Sau3A digest of the Eubacterium DNA. DNA sequence analysis of the baiA3 gene revealed 100% homology with the baiA1 gene within the coding region of the 27,000-dalton polypeptides. The baiA2 gene shares 81% sequence identity with the other two genes at the nucleotide level. The flanking nucleotide sequences associated with the baiA1 and baiA3 genes are identical for 930 bases in the 5' direction from the initiation codon and for at least 325 bases in the 3' direction from the stop codon, including the putative promoter regions for the genes. An additional open reading frame (occupying from 621 to 648 bases, depending on the correct start codon) was found in the identical 5' regions associated with the baiA1 and baiA3 clones. The 5' sequence 930 bases upstream from the baiA1 and baiA3 genes was totally divergent. The baiA2 gene, which is part of a large bile acid-inducible operon, showed no homology with the other two genes either in the 5' or 3' direction from the polypeptide coding region, except for a 15-base-pair presumed ribosome-binding site in the 5' region. These studies strongly suggest that a gene duplication (baiA1 and baiA3) has occurred and is stably maintained in this bacterium. Images PMID:2376563
Computational DNA hole spectroscopy: A new tool to predict mutation hotspots, critical base pairs, and disease ‘driver’ mutations

PubMed Central

Suárez, Martha Y.; Villagrán; Miller, John H.

2015-01-01

We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease ‘driver’ mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands. PMID:26310834
Computational DNA hole spectroscopy: A new tool to predict mutation hotspots, critical base pairs, and disease 'driver' mutations.

PubMed

Villagrán, Martha Y Suárez; Miller, John H

2015-08-27

We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease 'driver' mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.
Twisting Right to Left: A…A Mismatch in a CAG Trinucleotide Repeat Overexpansion Provokes Left-Handed Z-DNA Conformation

PubMed Central

2015-01-01

Conformational polymorphism of DNA is a major causative factor behind several incurable trinucleotide repeat expansion disorders that arise from overexpansion of trinucleotide repeats located in coding/non-coding regions of specific genes. Hairpin DNA structures that are formed due to overexpansion of CAG repeat lead to Huntington’s disorder and spinocerebellar ataxias. Nonetheless, DNA hairpin stem structure that generally embraces B-form with canonical base pairs is poorly understood in the context of periodic noncanonical A…A mismatch as found in CAG repeat overexpansion. Molecular dynamics simulations on DNA hairpin stems containing A…A mismatches in a CAG repeat overexpansion show that A…A dictates local Z-form irrespective of starting glycosyl conformation, in sharp contrast to canonical DNA duplex. Transition from B-to-Z is due to the mechanistic effect that originates from its pronounced nonisostericity with flanking canonical base pairs facilitated by base extrusion, backbone and/or base flipping. Based on these structural insights we envisage that such an unusual DNA structure of the CAG hairpin stem may have a role in disease pathogenesis. As this is the first study that delineates the influence of a single A…A mismatch in reversing DNA helicity, it would further have an impact on understanding DNA mismatch repair. PMID:25876062
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

PubMed

Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

2012-01-01

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).

PubMed

Liang, Jian-Ying; Lin, Rui-Qing

2016-11-01

In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard

2008-01-10

Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched inmore » bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.« less
Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

PubMed Central

Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

2016-01-01

Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037
The C terminus of Ku80 activates the DNA-dependent protein kinase catalytic subunit.

PubMed

Singleton, B K; Torres-Arzayus, M I; Rottinghaus, S T; Taccioli, G E; Jeggo, P A

1999-05-01

Ku is a heterodimeric protein with double-stranded DNA end-binding activity that operates in the process of nonhomologous end joining. Ku is thought to target the DNA-dependent protein kinase (DNA-PK) complex to the DNA and, when DNA bound, can interact and activate the DNA-PK catalytic subunit (DNA-PKcs). We have carried out a 3' deletion analysis of Ku80, the larger subunit of Ku, and shown that the C-terminal 178 amino acid residues are dispensable for DNA end-binding activity but are required for efficient interaction of Ku with DNA-PKcs. Cells expressing Ku80 proteins that lack the terminal 178 residues have low DNA-PK activity, are radiation sensitive, and can recombine the signal junctions but not the coding junctions during V(D)J recombination. These cells have therefore acquired the phenotype of mouse SCID cells despite expressing DNA-PKcs protein, suggesting that an interaction between DNA-PKcs and Ku, involving the C-terminal region of Ku80, is required for DNA double-strand break rejoining and coding but not signal joint formation. To gain further insight into important domains in Ku80, we report a point mutational change in Ku80 in the defective xrs-2 cell line. This residue is conserved among species and lies outside of the previously reported Ku70-Ku80 interaction domain. The mutational change nonetheless abrogates the Ku70-Ku80 interaction and DNA end-binding activity.
Complete mitochondrial DNA sequence of the Eastern keelback mullet Liza affinis.

PubMed

Gong, Xiaoling; Zhu, Wenjia; Bao, Baolong

2016-05-01

Eastern keelback mullet (Liza affinis) inhabits inlet waters and estuaries of rivers. In this paper, we initially determined the complete mitochondrial genome of Liza affinis. The entire mtDNA sequence is 16,831 bp in length, including 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes and 1 putative control region. Its order and numbers of genes are similar to most bony fishes.
Structure of the coding region and mRNA variants of the apyrase gene from pea (Pisum sativum)

NASA Technical Reports Server (NTRS)

Shibata, K.; Abe, S.; Davies, E.

2001-01-01

Partial amino acid sequences of a 49 kDa apyrase (ATP diphosphohydrolase, EC 3.6.1.5) from the cytoskeletal fraction of etiolated pea stems were used to derive oligonucleotide DNA primers to generate a cDNA fragment of pea apyrase mRNA by RT-PCR and these primers were used to screen a pea stem cDNA library. Two almost identical cDNAs differing in just 6 nucleotides within the coding regions were found, and these cDNA sequences were used to clone genomic fragments by PCR. Two nearly identical gene fragments containing 8 exons and 7 introns were obtained. One of them (H-type) encoded the mRNA sequence described by Hsieh et al. (1996) (DDBJ/EMBL/GenBank Z32743), while the other (S-type) differed by the same 6 nucleotides as the mRNAs, suggesting that these genes may be alleles. The six nucleotide differences between these two alleles were found solely in the first exon, and these mutation sites had two types of consensus sequences. These mRNAs were found with varying lengths of 3' untranslated regions (3'-UTR). There are some similarities between the 3'-UTR of these mRNAs and those of actin and actin binding proteins in plants. The putative roles of the 3'-UTR and alternative polyadenylation sites are discussed in relation to their possible role in targeting the mRNAs to different subcellular compartments.
DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

ERIC Educational Resources Information Center

McCallister, Gary

2005-01-01

The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)
New progress in snake mitochondrial gene rearrangement.

PubMed

Chen, Nian; Zhao, Shujin

2009-08-01

To further understand the evolution of snake mitochondrial genomes, the complete mitochondrial DNA (mtDNA) sequences were determined for representative species from two snake families: the Many-banded krait, the Banded krait, the Chinese cobra, the King cobra, the Hundred-pace viper, the Short-tailed mamushi, and the Chain viper. Thirteen protein-coding genes, 22-23 tRNA genes, 2 rRNA genes, and 2 control regions were identified in these mtDNAs. Duplication of the control region and translocation of the tRNAPro gene were two notable features of the snake mtDNAs. These results from the gene rearrangement comparisons confirm the correctness of traditional classification schemes and validate the utility of comparing complete mtDNA sequences for snake phylogeny reconstruction.
Identification of the razor clam species Ensis arcuatus, E. siliqua, E. directus, E. macha, and Solen marginatus using PCR-RFLP analysis of the 5S rDNA region.

PubMed

Fernandez-Tajes, Juan; Méndez, Josefina

2007-09-05

Polymerase chain reaction (PCR) and restriction fragment length polymorphism (RFLP) analysis of the 5S ribosomal DNA region has been applied to the establishment of DNA-based molecular markers for the identification of five razor clam species: Ensis arcuatus, E. siliqua, E. directus, E. macha, and Solen marginatus. PCR amplifications were carried out using a pair of universal primers from the coding region of 5S rDNA. S. marginatus was simply distinguished by the different size of the amplicons obtained. Species-specific restriction endonuclease patterns were found with the enzymes Hae III for E. arcuatus, E. siliqua, and E. directus, and Acs I for E. macha, and when two enzymes were combined, the four species were also identified. Thus, this work provides a simple, reliable, and rapid protocol for the accurate identification of Ensis and Solen species in fresh and canned products, which is very useful for traceability and to enforce labeling regulations.
Epigenetic variants of a transgenic petunia line show hypermethylation in transgene DNA: an indication for specific recognition of foreign DNA in transgenic plants.

PubMed

Meyer, P; Heidmann, I

1994-05-25

We analysed de novo DNA methylation occurring in plants obtained from the transgenic petunia line R101-17. This line contains one copy of the maize A1 gene that leads to the production of brick-red pelargonidin pigment in the flowers. Due to its integration into an unmethylated genomic region the A1 transgene is hypomethylated and transcriptionally active. Several epigenetic variants of line 17 were selected that exhibit characteristic and somatically stable pigmentation patterns, displaying fully coloured, marbled or colourless flowers. Analysis of the DNA methylation patterns revealed that the decrease in pigmentation among the epigenetic variants was correlated with an increase in methylation, specifically of the transgene DNA. No change in methylation of the hypomethylated integration region could be detected. A similar increase in methylation, specifically in the transgene region, was also observed among progeny of R101-17del, a deletion derivative of R101-17 that no longer produces pelargonidin pigments due to a deletion in the A1 coding region. Again de novo methylation is specifically directed to the transgene, while the hypomethylated character of neighbouring regions is not affected. Possible mechanisms for transgene-specific methylation and its consequences for long-term use of transgenic material are discussed.
A series of vectors to construct lacZ fusions for the study of gene expression in Schizosaccharomyces pombe.

PubMed

Lafuente, M J; Petit, T; Gancedo, C

1997-12-22

We have constructed a series of plasmids to facilitate the fusion of promoters with or without coding regions of genes of Schizosaccharomyces pombe to the lacZ gene of Escherichia coli. These vectors carry a multiple cloning region in which fission yeast DNA may be inserted in three different reading frames with respect to the coding region of lacZ. The plasmids were constructed with the ura4+ or the his3+ marker of S. pombe. Functionality of the plasmids was tested measuring in parallel the expression of fructose 1,6-bisphosphatase and beta-galactosidase under the control of the fbp1+ promoter in different conditions.

A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Phenotypic characterization of an Arabidopsis T-DNA insertion line SALK_063500.

PubMed

Sng, Natasha J; Paul, Anna-Lisa; Ferl, Robert J

2018-06-01

In this article we report the identification of a homozygous lethal T-DNA (transfer DNA) line within the coding region of the At1G05290 gene in the genome of Arabidopsis thaliana (Arabidopsis) line, SALK_063500. The T-DNA insertion is found within exon one of the AT1G05290 gene, however a homozygous T-DNA allele is unattainable. In the heterozygous T-DNA allele the expression levels of AT1G05290 were compared to wild type Arabidopsis (Col-0, Columbia). Further analyses revealed an aberrant silique phenotype found in the heterozygous SALK_063500 plants that is attributed to the reduced rate of pollen tube germination. These data are original and have not been published elsewhere.
Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions

DOE Office of Scientific and Technical Information (OSTI.GOV)

MacArthur, Stewart; Li, Xiao-Yong; Li, Jingyi

2009-05-15

BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of functionmore » and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.« less
Beta-keratins of differentiating epidermis of snake comprise glycine-proline-serine-rich proteins with an avian-like gene organization.

PubMed

Dalla Valle, Luisa; Nardi, Alessia; Belvedere, Paola; Toni, Mattia; Alibardi, Lorenzo

2007-07-01

Beta-keratins of reptilian scales have been recently cloned and characterized in some lizards. Here we report for the first time the sequence of some beta-keratins from the snake Elaphe guttata. Five different cDNAs were obtained using 5'- and 3'-RACE analyses. Four sequences differ by only few nucleotides in the coding region, whereas the last cDNA shows, in this region, only 84% of identity. The gene corresponding to one of the cDNA sequences has a single intron present in the 5'-untranslated region. This genomic organization is similar to that of birds' beta-keratins. Cloning and Southern blotting analysis suggest that snake beta-keratins belong to a family of high-related genes as for geckos. PCR analysis suggests a head-to-tail orientation of genes in the same chromosome. In situ hybridization detected beta-keratin transcripts almost exclusively in differentiating oberhautchen and beta-cells of the snake epidermis in renewal phase. This is confirmed by Northern blotting that showed, in this phase, a high expression of two different transcripts whereas only the longer transcript is expressed at a much lower level in resting skin. The cDNA coding sequences encoded putative glycine-proline-serine rich proteins containing 137-139 amino acids, with apparent isoelectric point at 7.5 and 8.2. A central region, rich in proline, shows over 50% homology with avian scale, claw, and feather keratins. The prediction of secondary structure shows mainly a random coil conformation and few beta-strand regions in the central region, likely involved in the formation of a fibrous framework of beta-keratins. This region was possibly present in basic reptiles that originated reptiles and birds. Copyright 2007 Wiley-Liss, Inc.
Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

PubMed

Galián, José A; Rosato, Marcela; Rosselló, Josep A

2014-03-01

Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.
Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin

PubMed Central

2011-01-01

Background The melon belongs to the Cucurbitaceae family, whose economic importance among vegetable crops is second only to Solanaceae. The melon has a small genome size (454 Mb), which makes it suitable for molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, as much as eight times larger than that of other cucurbits. Results The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small (SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit species, Citrullus lanatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, respectively. Conclusions Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit species, mitochondrial genomes show a wide variety of sizes, with a non-conserved structure both in gene number and organisation, as well as in the features of the noncoding DNA. The transfer of nuclear DNA to the melon mitochondrial genome and the high proportion of repetitive DNA appear to explain the size of the largest mitochondrial genome reported so far. PMID:21854637
Genes and Junk in Plant Mitochondria—Repair Mechanisms and Selection

PubMed Central

Christensen, Alan C.

2014-01-01

Plant mitochondrial genomes have very low mutation rates. In contrast, they also rearrange and expand frequently. This is easily understood if DNA repair in genes is accomplished by accurate mechanisms, whereas less accurate mechanisms including nonhomologous end joining or break-induced replication are used in nongenes. An important question is how different mechanisms of repair predominate in coding and noncoding DNA, although one possible mechanism is transcription-coupled repair (TCR). This work tests the predictions of TCR and finds no support for it. Examination of the mutation spectra and rates in genes and junk reveals what DNA repair mechanisms are available to plant mitochondria, and what selective forces act on the repair products. A model is proposed that mismatches and other DNA damages are repaired by converting them into double-strand breaks (DSBs). These can then be repaired by any of the DSB repair mechanisms, both accurate and inaccurate. Natural selection will eliminate coding regions repaired by inaccurate mechanisms, accounting for the low mutation rates in genes, whereas mutations, rearrangements, and expansions generated by inaccurate repair in noncoding regions will persist. Support for this model includes the structure of the mitochondrial mutS homolog in plants, which is fused to a double-strand endonuclease. The model proposes that plant mitochondria do not distinguish a damaged or mismatched DNA strand from the undamaged strand, they simply cut both strands and perform homology-based DSB repair. This plant-specific strategy for protecting future generations from mitochondrial DNA damage has the side effect of genome expansions and rearrangements. PMID:24904012
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.

PubMed

Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping

2016-01-01

The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.
Organizational heterogeneity of vertebrate genomes.

PubMed

Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

2012-01-01

Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species

PubMed Central

Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo

2016-01-01

The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527
Sequence of a cDNA encoding pancreatic preprosomatostatin-22.

PubMed Central

Magazin, M; Minth, C D; Funckes, C L; Deschenes, R; Tavianini, M A; Dixon, J E

1982-01-01

We report the nucleotide sequence of a precursor to somatostatin that upon proteolytic processing may give rise to a hormone of 22 amino acids. The nucleotide sequence of a cDNA from the channel catfish (Ictalurus punctatus) encodes a precursor to somatostatin that is 105 amino acids (Mr, 11,500). The cDNA coding for somatostatin-22 consists of 36 nucleotides in the 5' untranslated region, 315 nucleotides that code for the precursor to somatostatin-22, 269 nucleotides at the 3' untranslated region, and a variable length of poly(A). The putative preprohormone contains a sequence of hydrophobic amino acids at the amino terminus that has the properties of a "signal" peptide. A connecting sequence of approximately 57 amino acids is followed by a single Arg-Arg sequence, which immediately precedes the hormone. Somatostatin-22 is homologous to somatostatin-14 in 7 of the 14 amino acids, including the Phe-Trp-Lys sequence. Hybridization selection of mRNA, followed by its translation in a wheat germ cell-free system, resulted in the synthesis of a single polypeptide having a molecular weight of approximately 10,000 as estimated on Na-DodSO4/polyacrylamide gels. Images PMID:6127673
Many human accelerated regions are developmental enhancers

PubMed Central

Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.

2013-01-01

The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
Superimposed Code Theoretic Analysis of DNA Codes and DNA Computing

DTIC Science & Technology

2008-01-01

complements of one another and the DNA duplex formed is a Watson - Crick (WC) duplex. However, there are many instances when the formation of non-WC...that the user’s requirements for probe selection are met based on the Watson - Crick probe locality within a target. The second type, called...AFRL-RI-RS-TR-2007-288 Final Technical Report January 2008 SUPERIMPOSED CODE THEORETIC ANALYSIS OF DNA CODES AND DNA COMPUTING
New insights into mitogenomic phylogeny and copy number in eight indigenous sheep populations based on the ATP synthase and cytochrome c oxidase genes.

PubMed

Xiao, P; Niu, L L; Zhao, Q J; Chen, X Y; Wang, L J; Li, L; Zhang, H P; Guo, J Z; Xu, H Y; Zhong, T

2017-11-16

The origins and phylogeny of different sheep breeds has been widely studied using polymorphisms within the mitochondrial hypervariable region. However, little is known about the mitochondrial DNA (mtDNA) content and phylogeny based on mtDNA protein-coding genes. In this study, we assessed the phylogeny and copy number of the mtDNA in eight indigenous (population size, n=184) and three introduced (n=66) sheep breeds in China based on five mitochondrial coding genes (COX1, COX2, ATP8, ATP6 and COX3). The mean haplotype and nucleotide diversities were 0.944 and 0.00322, respectively. We identified a correlation between the lineages distribution and the genetic distance, whereby Valley-type Tibetan sheep had a closer genetic relationship with introduced breeds (Dorper, Poll Dorset and Suffolk) than with other indigenous breeds. Similarly, the Median-joining profile of haplotypes revealed the distribution of clusters according to genetic differences. Moreover, copy number analysis based on the five mitochondrial coding genes was affected by the genetic distance combining with genetic phylogeny; we also identified obvious non-synonymous mutations in ATP6 between the different levels of copy number expressions. These results imply that differences in mitogenomic compositions resulting from geographical separation lead to differences in mitochondrial function.
New Population and Phylogenetic Features of the Internal Variation within Mitochondrial DNA Macro-Haplogroup R0

PubMed Central

Cerezo, Maria; Quintáns, Beatriz; Zarrabeitia, Maria Teresa; Cuscó, Ivon; Lareu, Maria Victoria; García, Óscar; Pérez-Jurado, Luis; Carracedo, Ángel; Salas, Antonio

2009-01-01

Background R0 embraces the most common mitochondrial DNA (mtDNA) lineage in West Eurasia, namely, haplogroup H (∼40%). R0 sub-lineages are badly defined in the control region and therefore, the analysis of diagnostic coding region polymorphisms is needed in order to gain resolution in population and medical studies. Methodology/Principal Findings We sequenced the first hypervariable segment (HVS-I) of 518 individuals from different North Iberian regions. The mtDNAs belonging to R0 (∼57%) were further genotyped for a set of 71 coding region SNPs characterizing major and minor branches of R0. We found that the North Iberian Peninsula shows moderate levels of population stratification; for instance, haplogroup V reaches the highest frequency in Cantabria (north-central Iberia), but lower in Galicia (northwest Iberia) and Catalonia (northeast Iberia). When compared to other European and Middle East populations, haplogroups H1, H3 and H5a show frequency peaks in the Franco-Cantabrian region, declining from West towards the East and South Europe. In addition, we have characterized, by way of complete genome sequencing, a new autochthonous clade of haplogroup H in the Basque country, named H2a5. Its coalescence age, 15.6±8 thousand years ago (kya), dates to the period immediately after the Last Glacial Maximum (LGM). Conclusions/Significance In contrast to other H lineages that experienced re-expansion outside the Franco-Cantabrian refuge after the LGM (e.g. H1 and H3), H2a5 most likely remained confined to this area till present days. PMID:19340307
Gene and genon concept: coding versus regulation

PubMed Central

2007-01-01

We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Alteration of gene expression in human hepatocellular carcinoma with integrated hepatitis B virus DNA.

PubMed

Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei

2005-08-15

Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
Molecular characterization of Banana streak virus isolate from Musa Acuminata in China.

PubMed

Zhuang, Jun; Wang, Jian-Hua; Zhang, Xin; Liu, Zhi-Xin

2011-12-01

Banana streak virus (BSV), a member of genus Badnavirus, is a causal agent of banana streak disease throughout the world. The genetic diversity of BSVs from different regions of banana plantations has previously been investigated, but there are relatively few reports of the genetic characteristic of episomal (non-integrated) BSV genomes isolated from China. Here, the complete genome, a total of 7722bp (GenBank accession number DQ092436), of an isolate of Banana streak virus (BSV) on cultivar Cavendish (BSAcYNV) in Yunnan, China was determined. The genome organises in the typical manner of badnaviruses. The intergenic region of genomic DNA contains a large stem-loop, which may contribute to the ribosome shift into the following open reading frames (ORFs). The coding region of BSAcYNV consists of three overlapping ORFs, ORF1 with a non-AUG start codon and ORF2 encoding two small proteins are individually involved in viral movement and ORF3 encodes a polyprotein. Besides the complete genome, a defective genome lacking the whole RNA leader region and a majority of ORF1 and which encompasses 6525bp was also isolated and sequenced from this BSV DNA reservoir in infected banana plants. Sequence analyses showed that BSAcYNV has closest similarity in terms of genome organization and the coding assignments with an BSV isolate from Vietnam (BSAcVNV). The corresponding coding regions shared identities of 88% and -95% at nucleotide and amino acid levels, respectively. Phylogenetic analysis also indicated BSAcYNV shared the closest geographical evolutionary relationship to BSAcVNV among sequenced banana streak badnaviruses.
Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

PubMed

Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

2012-07-01

Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Molecular cloning of the mouse gene coding for {alpha}{sub 2}-macroglobulin and targeting of the gene in embryonic stem cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Umans, L.; Serneels, L.; Hilliker, C.

1994-08-01

The authors have cloned the mouse gene coding for {alpha}{sub 2}-macroglobulin in overlapping {lambda} clones and have analyzed its structure. The gene contains 36 exons, coding for the 4.8-kb cDNA that we cloned previously. Including putative control elements in the 5{prime} flanking region, the gene covers about 45 kb. A region of 3.8 kb, stretching from 835 bases upstream of the cDNA start site to exon 4, including all intervening sequences, was sequenced completely. The analysis demonstrated that the putative promoter region of the mouse A2M gene differed considerably from the known promoter sequences of the human A2M gene andmore » of the rat acute-phas A2M gene. Comparison of the exon-intron structure of all known genes of the A2M family confirmed that the rat acute phase A2M gene is more closely related to the human gene than to the mouse A2M gene. To generate mice with the A2M gene inactivated, an insertion type of construct containing 7.5 kb of genomic DNA of the mouse strain 129/J, encompassing exons 16 to 19, was synthesized. A hygromycin marker gene was embedded in intron 17. After electroporation, 198 hygromycin-resistant ES cell lines were isolated and analyzed by Southern blotting. Five ES cell lines were obtained with one allele of the mouse A2M gene targeted by this insertion construct, demonstrating that the position and the characteristics of the vector served the intended goal.« less

Polymorphism at the defensin gene in the Anopheles gambiae complex: testing different selection hypotheses

PubMed Central

Simard, Frédéric; Licht, Monica; Besansky, Nora J.; Lehmann, Tovi

2007-01-01

Genetic variation in defensin, a gene encoding a major effector molecule of insects immune response was analyzed within and between populations of three members of the Anopheles gambiae complex. The species selected included the two anthropophilic species, An. gambiae and An. arabiensis and the most zoophilic species of the complex, An. quadriannulatus. The first species was represented by four populations spanning its extreme genetic and geographical ranges, whereas each of the other two species was represented by a single population. We found (i) reduced overall polymorphism in the mature peptide region and in the total coding region, together with specific reductions in rare and moderately frequent mutations (sites) in the coding region compared with non coding regions, (ii) markedly reduced rate of nonsynonymous diversity compared with synonymous variation in the mature peptide and virtually identical mature peptide across the three species, and (iii) increased divergence between species in the mature peptide together with reduced differentiation between populations of An. gambiae in the same DNA region. These patterns suggest a strong purifying selection on the mature peptide and probably the whole coding region. Because An. quadriannulatus is not exposed to human pathogens, identical mature peptide and similar pattern of polymorphism across species implies that human pathogens played no role as selective agents on this peptide. PMID:17161659
Stable chromosome condensation revealed by chromosome conformation capture

PubMed Central

Eagen, Kyle P.; Hartl, Tom A.; Kornberg, Roger D.

2015-01-01

SUMMARY Chemical cross-linking and DNA sequencing have revealed regions of intra-chromosomal interaction, referred to as topologically associating domains (TADs), interspersed with regions of little or no interaction, in interphase nuclei. We find that TADs and the regions between them correspond with the bands and interbands of polytene chromosomes of Drosophila. We further establish the conservation of TADs between polytene and diploid cells of Drosophila. From direct measurements on light micrographs of polytene chromosomes, we then deduce the states of chromatin folding in the diploid cell nucleus. Two states of folding, fully extended fibers containing regulatory regions and promoters, and fibers condensed up to ten-fold containing coding regions of active genes, constitute the euchromatin of the nuclear interior. Chromatin fibers condensed up to 30-fold, containing coding regions of inactive genes, represent the heterochromatin of the nuclear periphery. A convergence of molecular analysis with direct observation thus reveals the architecture of interphase chromosomes. PMID:26544940
Identification of a high-efficiency baculovirus DNA replication origin that functions in insect and mammalian cells.

PubMed

Wu, Yueh-Lung; Wu, Carol-P; Huang, Yu-Hui; Huang, Sheng-Ping; Lo, Huei-Ru; Chang, Hao-Shuo; Lin, Pi-Hsiu; Wu, Ming-Cheng; Chang, Chia-Jung; Chao, Yu-Chan

2014-11-01

The p143 gene from Autographa californica multinucleocapsid nucleopolyhedrovirus (AcMNPV) has been found to increase the expression of luciferase, which is driven by the polyhedrin gene promoter, in a plasmid with virus coinfection. Further study indicated that this is due to the presence of a replication origin (ori) in the coding region of this gene. Transient DNA replication assays showed that a specific fragment of the p143 coding sequence, p143-3, underwent virus-dependent DNA replication in Spodoptera frugiperda IPLB-Sf-21 (Sf-21) cells. Deletion analysis of the p143-3 fragment showed that subfragment p143-3.2a contained the essential sequence of this putative ori. Sequence analysis of this region revealed a unique distribution of imperfect palindromes with high AT contents. No sequence homology or similarity between p143-3.2a and any other known ori was detected, suggesting that it is a novel baculovirus ori. Further study showed that the p143-3.2a ori can replicate more efficiently in infected Sf-21 cells than baculovirus homologous regions (hrs), the major baculovirus ori, or non-hr oris during virus replication. Previously, hr on its own was unable to replicate in mammalian cells, and for mammalian viral oris, viral proteins are generally required for their proper replication in host cells. However, the p143-3.2a ori was, surprisingly, found to function as an efficient ori in mammalian cells without the need for any viral proteins. We conclude that p143 contains a unique sequence that can function as an ori to enhance gene expression in not only insect cells but also mammalian cells. Baculovirus DNA replication relies on both hr and non-hr oris; however, so far very little is known about the latter oris. Here we have identified a new non-hr ori, the p143 ori, which resides in the coding region of p143. By developing a novel DNA replication-enhanced reporter system, we have identified and located the core region required for the p143 ori. This ori contains a large number of imperfect inverted repeats and is the most active ori in the viral genome during virus infection in insect cells. We also found that it is a unique ori that can replicate in mammalian cells without the assistance of baculovirus gene products. The identification of this ori should contribute to a better understanding of baculovirus DNA replication. Also, this ori is very useful in assisting with gene expression in mammalian cells. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
TRX-LOGOS - a graphical tool to demonstrate DNA information content dependent upon backbone dynamics in addition to base sequence.

PubMed

Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A

2015-01-01

It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
The complete chloroplast genome sequence of Dianthus superbus var. longicalycinus.

PubMed

Gurusamy, Raman; Lee, Do-Hyung; Park, SeonJoo

2016-05-01

The complete chloroplast genome (cpDNA) sequence of Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicine was reported and characterized. The cpDNA of Dianthus superbus var. longicalycinus is 149,539 bp, with 36.3% GC content. A pair of inverted repeats (IRs) of 24,803 bp is separated by a large single-copy region (LSC, 82,805 bp) and a small single-copy region (SSC, 17,128 bp). It encodes 85 protein-coding genes, 36 tRNA genes and 8 rRNA genes. Of 129 individual genes, 13 genes encoded one intron and three genes have two introns.
Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair.

PubMed

Day, Felix R; Ruth, Katherine S; Thompson, Deborah J; Lunetta, Kathryn L; Pervjakova, Natalia; Chasman, Daniel I; Stolk, Lisette; Finucane, Hilary K; Sulem, Patrick; Bulik-Sullivan, Brendan; Esko, Tõnu; Johnson, Andrew D; Elks, Cathy E; Franceschini, Nora; He, Chunyan; Altmaier, Elisabeth; Brody, Jennifer A; Franke, Lude L; Huffman, Jennifer E; Keller, Margaux F; McArdle, Patrick F; Nutile, Teresa; Porcu, Eleonora; Robino, Antonietta; Rose, Lynda M; Schick, Ursula M; Smith, Jennifer A; Teumer, Alexander; Traglia, Michela; Vuckovic, Dragana; Yao, Jie; Zhao, Wei; Albrecht, Eva; Amin, Najaf; Corre, Tanguy; Hottenga, Jouke-Jan; Mangino, Massimo; Smith, Albert V; Tanaka, Toshiko; Abecasis, Goncalo; Andrulis, Irene L; Anton-Culver, Hoda; Antoniou, Antonis C; Arndt, Volker; Arnold, Alice M; Barbieri, Caterina; Beckmann, Matthias W; Beeghly-Fadiel, Alicia; Benitez, Javier; Bernstein, Leslie; Bielinski, Suzette J; Blomqvist, Carl; Boerwinkle, Eric; Bogdanova, Natalia V; Bojesen, Stig E; Bolla, Manjeet K; Borresen-Dale, Anne-Lise; Boutin, Thibaud S; Brauch, Hiltrud; Brenner, Hermann; Brüning, Thomas; Burwinkel, Barbara; Campbell, Archie; Campbell, Harry; Chanock, Stephen J; Chapman, J Ross; Chen, Yii-Der Ida; Chenevix-Trench, Georgia; Couch, Fergus J; Coviello, Andrea D; Cox, Angela; Czene, Kamila; Darabi, Hatef; De Vivo, Immaculata; Demerath, Ellen W; Dennis, Joe; Devilee, Peter; Dörk, Thilo; Dos-Santos-Silva, Isabel; Dunning, Alison M; Eicher, John D; Fasching, Peter A; Faul, Jessica D; Figueroa, Jonine; Flesch-Janys, Dieter; Gandin, Ilaria; Garcia, Melissa E; García-Closas, Montserrat; Giles, Graham G; Girotto, Giorgia G; Goldberg, Mark S; González-Neira, Anna; Goodarzi, Mark O; Grove, Megan L; Gudbjartsson, Daniel F; Guénel, Pascal; Guo, Xiuqing; Haiman, Christopher A; Hall, Per; Hamann, Ute; Henderson, Brian E; Hocking, Lynne J; Hofman, Albert; Homuth, Georg; Hooning, Maartje J; Hopper, John L; Hu, Frank B; Huang, Jinyan; Humphreys, Keith; Hunter, David J; Jakubowska, Anna; Jones, Samuel E; Kabisch, Maria; Karasik, David; Knight, Julia A; Kolcic, Ivana; Kooperberg, Charles; Kosma, Veli-Matti; Kriebel, Jennifer; Kristensen, Vessela; Lambrechts, Diether; Langenberg, Claudia; Li, Jingmei; Li, Xin; Lindström, Sara; Liu, Yongmei; Luan, Jian'an; Lubinski, Jan; Mägi, Reedik; Mannermaa, Arto; Manz, Judith; Margolin, Sara; Marten, Jonathan; Martin, Nicholas G; Masciullo, Corrado; Meindl, Alfons; Michailidou, Kyriaki; Mihailov, Evelin; Milani, Lili; Milne, Roger L; Müller-Nurasyid, Martina; Nalls, Michael; Neale, Ben M; Nevanlinna, Heli; Neven, Patrick; Newman, Anne B; Nordestgaard, Børge G; Olson, Janet E; Padmanabhan, Sandosh; Peterlongo, Paolo; Peters, Ulrike; Petersmann, Astrid; Peto, Julian; Pharoah, Paul D P; Pirastu, Nicola N; Pirie, Ailith; Pistis, Giorgio; Polasek, Ozren; Porteous, David; Psaty, Bruce M; Pylkäs, Katri; Radice, Paolo; Raffel, Leslie J; Rivadeneira, Fernando; Rudan, Igor; Rudolph, Anja; Ruggiero, Daniela; Sala, Cinzia F; Sanna, Serena; Sawyer, Elinor J; Schlessinger, David; Schmidt, Marjanka K; Schmidt, Frank; Schmutzler, Rita K; Schoemaker, Minouk J; Scott, Robert A; Seynaeve, Caroline M; Simard, Jacques; Sorice, Rossella; Southey, Melissa C; Stöckl, Doris; Strauch, Konstantin; Swerdlow, Anthony; Taylor, Kent D; Thorsteinsdottir, Unnur; Toland, Amanda E; Tomlinson, Ian; Truong, Thérèse; Tryggvadottir, Laufey; Turner, Stephen T; Vozzi, Diego; Wang, Qin; Wellons, Melissa; Willemsen, Gonneke; Wilson, James F; Winqvist, Robert; Wolffenbuttel, Bruce B H R; Wright, Alan F; Yannoukakos, Drakoulis; Zemunik, Tatijana; Zheng, Wei; Zygmunt, Marek; Bergmann, Sven; Boomsma, Dorret I; Buring, Julie E; Ferrucci, Luigi; Montgomery, Grant W; Gudnason, Vilmundur; Spector, Tim D; van Duijn, Cornelia M; Alizadeh, Behrooz Z; Ciullo, Marina; Crisponi, Laura; Easton, Douglas F; Gasparini, Paolo P; Gieger, Christian; Harris, Tamara B; Hayward, Caroline; Kardia, Sharon L R; Kraft, Peter; McKnight, Barbara; Metspalu, Andres; Morrison, Alanna C; Reiner, Alex P; Ridker, Paul M; Rotter, Jerome I; Toniolo, Daniela; Uitterlinden, André G; Ulivi, Sheila; Völzke, Henry; Wareham, Nicholas J; Weir, David R; Yerges-Armstrong, Laura M; Price, Alkes L; Stefansson, Kari; Visser, Jenny A; Ong, Ken K; Chang-Claude, Jenny; Murabito, Joanne M; Perry, John R B; Murray, Anna

2015-11-01

Menopause timing has a substantial impact on infertility and risk of disease, including breast cancer, but the underlying mechanisms are poorly understood. We report a dual strategy in ∼70,000 women to identify common and low-frequency protein-coding variation associated with age at natural menopause (ANM). We identified 44 regions with common variants, including two regions harboring additional rare missense alleles of large effect. We found enrichment of signals in or near genes involved in delayed puberty, highlighting the first molecular links between the onset and end of reproductive lifespan. Pathway analyses identified major association with DNA damage response (DDR) genes, including the first common coding variant in BRCA1 associated with any complex trait. Mendelian randomization analyses supported a causal effect of later ANM on breast cancer risk (∼6% increase in risk per year; P = 3 × 10(-14)), likely mediated by prolonged sex hormone exposure rather than DDR mechanisms.
Strain diversity and host specificity in bee gut symbionts revealed by deep sampling of single copy protein-coding sequences

PubMed Central

Powell, J. Elijah; Ratnayeke, Nalin; Moran, Nancy A.

2017-01-01

High throughput rRNA amplicon surveys of bacterial communities provide a rapid snapshot of taxonomic composition. But strains with nearly identical rRNA sequences often differ in gene repertoires and metabolic capabilities. To assess strain-level variation within Snodgrassella alvi, a gut symbiont of corbiculate bees, we performed deep sequencing on amplicons of a single copy coding gene (minD) as well as the 16S rDNA V4 region. We surveyed honey bees (Apis mellifera) sampled globally and 12 bumble bee species (Bombus) sampled from two regions of the USA. The minD analyses reveal that S. alvi contains far more strain diversity than is evident from 16S rDNA analysis. Many taxa inferred on the basis of 16S rDNA are shared between A. mellifera and Bombus species, but taxa inferred on the basis of minD are never shared and often are restricted to particular Bombus species. Clustering based on minD revealed that gut communities often reflect host species and geographic location. Both minD and 16S rDNA analyses indicate that strain diversity is higher in A. mellifera than in Bombus species. The minD locus flanks a 16S gene, enabling development of strain-specific 16S fluorescent probes to illuminate the spatial relationship of strains within the bee gut. PMID:27482856
A T-DNA gene required for agropine biosynthesis by transformed plants is functionally and evolutionarily related to a Ti plasmid gene required for catabolism of agropine by Agrobacterium strains.

PubMed Central

Hong, S B; Hwang, I; Dessaux, Y; Guyon, P; Kim, K S; Farrand, S K

1997-01-01

The mechanisms that ensure that Ti plasmid T-DNA genes encoding proteins involved in the biosynthesis of opines in crown gall tumors are always matched by Ti plasmid genes conferring the ability to catabolize that set of opines on the inducing Agrobacterium strains are unknown. The pathway for the biosynthesis of the opine agropine is thought to require an enzyme, mannopine cyclase, coded for by the ags gene located in the T(R) region of octopine-type Ti plasmids. Extracts prepared from agropine-type tumors contained an activity that cyclized mannopine to agropine. Tumor cells containing a T region in which ags was mutated lacked this activity and did not contain agropine. Expression of ags from the lac promoter conferred mannopine-lactonizing activity on Escherichia coli. Agrobacterium tumefaciens strains harboring an octopine-type Ti plasmid exhibit a similar activity which is not coded for by ags. Analysis of the DNA sequence of the gene encoding this activity, called agcA, showed it to be about 60% identical to T-DNA ags genes. Relatedness decreased abruptly in the 5' and 3' untranslated regions of the genes. ags is preceded by a promoter that functions only in the plant. Expression analysis showed that agcA also is preceded by its own promoter, which is active in the bacterium. Translation of agcA yielded a protein of about 45 kDa, consistent with the size predicted from the DNA sequence. Antibodies raised against the agcA product cross-reacted with the anabolic enzyme. These results indicate that the agropine system arose by a duplication of a progenitor gene, one copy of which became associated with the T-DNA and the other copy of which remained associated with the bacterium. PMID:9244272
Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research

Cancer.gov

The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins called chromatin that compacts the DNA in the nucleus, strongly restricting access to DNA sequences. As a result, regulatory factors only interact with a small subset of their potential binding elements in a given cell to regulate genes. How factors recognize and select sites in chromatin across the genome is not well understood -- but several discoveries in CCR’s Laboratory of Receptor Biology and Gene Expression (LRBGE) have shed light on the mechanisms that direct factors to DNA.
Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

PubMed

Qiu, Guo-Hua

2016-01-01

In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.
Mitochondrial sequence analysis for forensic identification using pyrosequencing technology.

PubMed

Andréasson, H; Asp, A; Alderborn, A; Gyllensten, U; Allen, M

2002-01-01

Over recent years, requests for mtDNA analysis in the field of forensic medicine have notably increased, and the results of such analyses have proved to be very useful in forensic cases where nuclear DNA analysis cannot be performed. Traditionally, mtDNA has been analyzed by DNA sequencing of the two hypervariable regions, HVI and HVII, in the D-loop. DNA sequence analysis using the conventional Sanger sequencing is very robust but time consuming and labor intensive. By contrast, mtDNA analysis based on the pyrosequencing technology provides fast and accurate results from the human mtDNA present in many types of evidence materials in forensic casework. The assay has been developed to determine polymorphic sites in the mitochondrial D-loop as well as the coding region to further increase the discrimination power of mtDNA analysis. The pyrosequencing technology for analysis of mtDNA polymorphisms has been tested with regard to sensitivity, reproducibility, and success rate when applied to control samples and actual casework materials. The results show that the method is very accurate and sensitive; the results are easily interpreted and provide a high success rate on casework samples. The panel of pyrosequencing reactions for the mtDNA polymorphisms were chosen to result in an optimal discrimination power in relation to the number of bases determined.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

PubMed

Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

1995-05-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

NASA Technical Reports Server (NTRS)

Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implications.

PubMed

Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong

2009-03-31

The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our molecular trees support Ohwi's original treatment of Megaleranthis saniculiforia to Trollius chosenensis Ohwi.
Polymerization of non-complementary RNA: systematic symmetric nucleotide exchanges mainly involving uracil produce mitochondrial RNA transcripts coding for cryptic overlapping genes.

PubMed

Seligmann, Hervé

2013-03-01

Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Characterization of the cod (Gadus morhua) steroidogenic acute regulatory protein (StAR) sheds light on StAR gene structure in fish.

PubMed

Goetz, Frederick W; Norberg, Birgitta; McCauley, Linda A R; Iliev, Dimitar B

2004-03-01

The full-length cDNA for the cod (Gadus morhua) StAR was cloned by RT-PCR and library screening using ovarian RNA. From the library screening, 2 size classes of cDNA were obtained; a 1577 bp cDNA (cStAR1) and a 2851 bp cDNA (cStAR2). The cStAR1 cDNA presumably encodes a protein of 286 amino acids. The cStAR2 cDNA was composed of 6 separated sequences that contained all of the coding regions of cStAR1 when added together, but also contained 5 noncoding regions not observed in cStAR1. Polymerase chain reactions of cod genomic DNA produced products slightly larger than cStAR2. The sequence of these products were the same as cStAR2 but revealed one additional noncoding region (intron). Thus, the fish StAR gene contains the same number of exons (7) and introns (6) as observed in mammals, but is approximately half the size of the mammalian gene. Using Northern analysis and RT-PCR, cStAR1 expression was observed only in testes, ovaries and head kidneys. Polymerase chain reaction products were also observed using cDNA from steroidogenic tissues and primers designed to regions specific for cStAR2, indicating that cStAR2 is expressed in tissues and may account for the presence of larger transcripts observed on Northern blots.
Pathogenesis of Chagas' Disease: Parasite Persistence and Autoimmunity

PubMed Central

Teixeira, Antonio R. L.; Hecht, Mariana M.; Guimaro, Maria C.; Sousa, Alessandro O.; Nitz, Nadjar

2011-01-01

Summary: Acute Trypanosoma cruzi infections can be asymptomatic, but chronically infected individuals can die of Chagas' disease. The transfer of the parasite mitochondrial kinetoplast DNA (kDNA) minicircle to the genome of chagasic patients can explain the pathogenesis of the disease; in cases of Chagas' disease with evident cardiomyopathy, the kDNA minicircles integrate mainly into retrotransposons at several chromosomes, but the minicircles are also detected in coding regions of genes that regulate cell growth, differentiation, and immune responses. An accurate evaluation of the role played by the genotype alterations in the autoimmune rejection of self-tissues in Chagas' disease is achieved with the cross-kingdom chicken model system, which is refractory to T. cruzi infections. The inoculation of T. cruzi into embryonated eggs prior to incubation generates parasite-free chicks, which retain the kDNA minicircle sequence mainly in the macrochromosome coding genes. Crossbreeding transfers the kDNA mutations to the chicken progeny. The kDNA-mutated chickens develop severe cardiomyopathy in adult life and die of heart failure. The phenotyping of the lesions revealed that cytotoxic CD45, CD8+ γδ, and CD8α+ T lymphocytes carry out the rejection of the chicken heart. These results suggest that the inflammatory cardiomyopathy of Chagas' disease is a genetically driven autoimmune disease. PMID:21734249
ANN modeling of DNA sequences: new strategies using DNA shape code.

PubMed

Parbhane, R V; Tambe, S S; Kulkarni, B D

2000-09-01

Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.

Satellite DNA Modulates Gene Expression in the Beetle Tribolium castaneum after Heat Stress

PubMed Central

Feliciello, Isidoro; Akrap, Ivana; Ugarković, Đurđica

2015-01-01

Non-coding repetitive DNAs have been proposed to perform a gene regulatory role, however for tandemly repeated satellite DNA no such role was defined until now. Here we provide the first evidence for a role of satellite DNA in the modulation of gene expression under specific environmental conditions. The major satellite DNA TCAST1 in the beetle Tribolium castaneum is preferentially located within pericentromeric heterochromatin but is also dispersed as single repeats or short arrays in the vicinity of protein-coding genes within euchromatin. Our results show enhanced suppression of activity of TCAST1-associated genes and slower recovery of their activity after long-term heat stress relative to the same genes without associated TCAST1 satellite DNA elements. The level of gene suppression is not influenced by the distance of TCAST1 elements from the associated genes up to 40 kb from the genes’ transcription start sites, but it does depend on the copy number of TCAST1 repeats within an element, being stronger for the higher number of copies. The enhanced gene suppression correlates with the enrichment of the repressive histone marks H3K9me2/3 at dispersed TCAST1 elements and their flanking regions as well as with increased expression of TCAST1 satellite DNA. The results reveal transient, RNAi based heterochromatin formation at dispersed TCAST1 repeats and their proximal regions as a mechanism responsible for enhanced silencing of TCAST1-associated genes. Differences in the pattern of distribution of TCAST1 elements contribute to gene expression diversity among T. castaneum strains after long-term heat stress and might have an impact on adaptation to different environmental conditions. PMID:26275223
N6-methyladenine: a conserved and dynamic DNA mark

PubMed Central

O’Brown, Zach Klapholz; Greer, Eric Lieberman

2017-01-01

Chromatin, consisting of deoxyribonucleic acid (DNA) wrapped around histone proteins, facilitates DNA compaction and allows identical DNA code to confer many different cellular phenotypes. This biological versatility is accomplished in large part by post-translational modifications to histones and chemical modifications to DNA. These modifications direct the cellular machinery to expand or compact specific chromatin regions, and mark regions of the DNA as important for cellular functions. While each of the four bases that make up DNA can be modified (Iyer et al. 2011), this chapter will focus on methylation of the 6th position on adenines (6mA), as this modification has been poorly characterized in recently evolved eukaryotes but shows promise as a new conserved layer of epigenetic regulation. 6mA was previously thought to be restricted to unicellular organisms, but recent work has revealed its presence in more recently evolved metazoa. Here, we will briefly describe the history of 6mA, examine its evolutionary conservation, and evaluate the current methods for detecting 6mA. We will discuss the enzymes that bind and regulate this mark and finally examine known and potential functions of 6mA in eukaryotes. PMID:27826841
The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three asterids.

PubMed

Yi, Dong-Keun; Lee, Hae-Lim; Sun, Byung-Yun; Chung, Mi Yoon; Kim, Ki-Joong

2012-05-01

This study reports the complete chloroplast (cp) DNA sequence of Eleutherococcus senticosus (GenBank: JN 637765), an endangered endemic species. The genome is 156,768 bp in length, and contains a pair of inverted repeat (IR) regions of 25,930 bp each, a large single copy (LSC) region of 86,755 bp and a small single copy (SSC) region of 18,153 bp. The structural organization, gene and intron contents, gene order, AT content, codon usage, and transcription units of the E. senticosus chloroplast genome are similar to that of typical land plant cp DNA. We aligned and analyzed the sequences of 86 coding genes, 19 introns and 113 intergenic spacers (IGS) in three different taxonomic hierarchies; Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. The distribution of indels, the number of polymorphic sites and nucleotide diversity indicate that positional constraint is more important than functional constraint for the evolution of cp genome sequences in Asterids. For example, the intron sequences in the LSC region exhibited base substitution rates 5-11-times higher than that of the IR regions, while the intron sequences in the SSC region evolved 7-14-times faster than those in the IR region. Furthermore, the Ka/Ks ratio of the gene coding sequences supports a stronger evolutionary constraint in the IR region than in the LSC or SSC regions. Therefore, our data suggest that selective sweeps by base collection mechanisms more frequently eliminate polymorphisms in the IR region than in other regions. Chloroplast genome regions that have high levels of base substitutions also show higher incidences of indels. Thirty-five simple sequence repeat (SSR) loci were identified in the Eleutherococcus chloroplast genome. Of these, 27 are homopolymers, while six are di-polymers and two are tri-polymers. In addition to the SSR loci, we also identified 18 medium size repeat units ranging from 22 to 79 bp, 11 of which are distributed in the IGS or intron regions. These medium size repeats may contribute to developing a cp genome-specific gene introduction vector because the region may use for specific recombination sites.
Primary structure of prostaglandin G/H synthase from sheep vesicular gland determined from the complementary DNA sequence.

PubMed Central

DeWitt, D L; Smith, W L

1988-01-01

Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548
Identification of G-quadruplex forming sequences in three manatee papillomaviruses

PubMed Central

Zahin, Maryam; Dean, William L.; Ghim, Shin-je; Joh, Joongho; Gray, Robert D.; Khanal, Sujita; Bossart, Gregory D.; Mignucci-Giannoni, Antonio A.; Rouchka, Eric C.; Jenson, Alfred B.; Trent, John O.; Chaires, Jonathan B.

2018-01-01

The Florida manatee (Trichechus manatus latirotris) is a threatened aquatic mammal in United States coastal waters. Over the past decade, the appearance of papillomavirus-induced lesions and viral papillomatosis in manatees has been a concern for those involved in the management and rehabilitation of this species. To date, three manatee papillomaviruses (TmPVs) have been identified in Florida manatees, one forming cutaneous lesions (TmPV1) and two forming genital lesions (TmPV3 and TmPV4). We identified DNA sequences with the potential to form G-quadruplex structures (G4) across the three genomes. G4 were located on both DNA strands and across coding and non-coding regions on all TmPVs, offering multiple targets for viral control. Although G4 have been identified in several viral genomes, including human PVs, most research has focused on canonical structures comprised of three G-tetrads. In contrast, the vast majority of sequences we identified would allow the formation of non-canonical structures with only two G-tetrads. Our biophysical analysis confirmed the formation of G4 with parallel topology in three such sequences from the E2 region. Two of the structures appear comprised of multiple stacked two G-tetrad structures, perhaps serving to increase structural stability. Computational analysis demonstrated enrichment of G4 sequences on all TmPVs on the reverse strand in the E2/E4 region and on both strands in the L2 region. Several G4 sequences occurred at similar regional locations on all PVs, most notably on the reverse strand in the E2 region. In other cases, G4 were identified at similar regional locations only on PVs forming genital lesions. On all TmPVs, G4 sequences were located in the non-coding region near putative E2 binding sites. Together, these findings suggest that G4 are possible regulatory elements in TmPVs. PMID:29630682
Few mitochondrial DNA sequences are inserted into the turkey (Meleagris gallopavo) nuclear genome: evolutionary analyses and informativity in the domestic lineage.

PubMed

Schiavo, G; Strillacci, M G; Ribani, A; Bovo, S; Roman-Ponce, S I; Cerolini, S; Bertolini, F; Bagnato, A; Fontanesi, L

2018-06-01

Mitochondrial DNA (mtDNA) insertions have been detected in the nuclear genome of many eukaryotes. These sequences are pseudogenes originated by horizontal transfer of mtDNA fragments into the nuclear genome, producing nuclear DNA sequences of mitochondrial origin (numt). In this study we determined the frequency and distribution of mtDNA-originated pseudogenes in the turkey (Meleagris gallopavo) nuclear genome. The turkey reference genome (Turkey_2.01) was aligned with the reference linearized mtDNA sequence using last. A total of 32 numt sequences (corresponding to 18 numt regions derived by unique insertional events) were identified in the turkey nuclear genome (size ranging from 66 to 1415 bp; identity against the modern turkey mtDNA corresponding region ranging from 62% to 100%). Numts were distributed in nine chromosomes and in one scaffold. They derived from parts of 10 mtDNA protein-coding genes, ribosomal genes, the control region and 10 tRNA genes. Seven numt regions reported in the turkey genome were identified in orthologues positions in the Gallus gallus genome and therefore were present in the ancestral genome that in the Cretaceous originated the lineages of the modern crown Galliformes. Five recently integrated turkey numts were validated by PCR in 168 turkeys of six different domestic populations. None of the analysed numts were polymorphic (i.e. absence of the inserted sequence, as reported in numts of recent integration in other species), suggesting that the reticulate speciation model is not useful for explaining the origin of the domesticated turkey lineage. © 2018 Stichting International Foundation for Animal Genetics.
Is “Junk” DNA Mostly Intron DNA?

PubMed Central

Wong, Gane Ka-Shu; Passey, Douglas A.; Huang, Ying-zong; Yang, Zhiyong; Yu, Jun

2000-01-01

Among higher eukaryotes, very little of the genome codes for protein. What is in the rest of the genome, or the “junk” DNA, that, in Homo sapiens, is estimated to be almost 97% of the genome? Is it possible that much of this “junk” is intron DNA? This is not a question that can be answered just by looking at the published data, even from the finished genomes. One cannot assume that there are no genes in a sequenced region, just because no genes were annotated. We introduce another approach to this problem, based on an analysis of the cDNA-to-genomic alignments, in all of the complete or nearly-complete genomes from the multicellular organisms. Our conclusion is that, in animals but not in plants, most of the “junk” is intron DNA. PMID:11076852
The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

PubMed Central

Fanning, T; Singer, M

1987-01-01

Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227
Multiple components in restriction enzyme digests of mammalian (insectivore), avian and reptilian genomic DNA hybridize with murine immunoglobulin VH probes.

PubMed

Litman, G W; Berger, L; Jahn, C L

1982-06-11

High molecular weight genomic DNAs isolated from an insectivore, Tupaia, and a representative reptilian, Caiman, and avian, Gallus, were digested with restriction endonucleases transferred to nitrocellulose and hybridized with nick-translated probes of murine VH genes. The derivations of the probes designated S107V (1) and mu 107V (2,3) have been described previously. Under conditions of reduced stringency, multiple hybridizing components were observed with Tupaia and Caiman; only mu mu 107V exhibited significant hybridization with the separated fragments of Gallus DNA. The nick-translated S107V probe was digested with Fnu4H1 and subinserts corresponding to the 5' and 3' regions both detected multiple hybridizing components in Tupaia and Caiman DNA. A 5' probe lacking the leader sequence identified the same components as the intact 5' probe, suggesting that VH coding regions distant as the reptilians may possess multiple genetic components which exhibit significant homology with murine immunoglobulin in VH regions.
Multiple components in restriction enzyme digests of mammalian (insectivore), avian and reptilian genomic DNA hybridize with murine immunoglobulin VH probes.

PubMed Central

Litman, G W; Berger, L; Jahn, C L

1982-01-01

High molecular weight genomic DNAs isolated from an insectivore, Tupaia, and a representative reptilian, Caiman, and avian, Gallus, were digested with restriction endonucleases transferred to nitrocellulose and hybridized with nick-translated probes of murine VH genes. The derivations of the probes designated S107V (1) and mu 107V (2,3) have been described previously. Under conditions of reduced stringency, multiple hybridizing components were observed with Tupaia and Caiman; only mu mu 107V exhibited significant hybridization with the separated fragments of Gallus DNA. The nick-translated S107V probe was digested with Fnu4H1 and subinserts corresponding to the 5' and 3' regions both detected multiple hybridizing components in Tupaia and Caiman DNA. A 5' probe lacking the leader sequence identified the same components as the intact 5' probe, suggesting that VH coding regions distant as the reptilians may possess multiple genetic components which exhibit significant homology with murine immunoglobulin in VH regions. Images PMID:6285298
DNA rearrangements directed by non-coding RNAs in ciliates

PubMed Central

Mochizuki, Kazufumi

2013-01-01

Extensive programmed rearrangement of DNA, including DNA elimination, chromosome fragmentation, and DNA descrambling, takes place in the newly developed macronucleus during the sexual reproduction of ciliated protozoa. Recent studies have revealed that two distant classes of ciliates use distinct types of non-coding RNAs to regulate such DNA rearrangement events. DNA elimination in Tetrahymena is regulated by small non-coding RNAs that are produced and utilized in an RNAi-related process. It has been proposed that the small RNAs produced from the micronuclear genome are used to identify eliminated DNA sequences by whole-genome comparison between the parental macronucleus and the micronucleus. In contrast, DNA descrambling in Oxytricha is guided by long non-coding RNAs that are produced from the parental macronuclear genome. These long RNAs are proposed to act as templates for the direct descrambling events that occur in the developing macronucleus. Both cases provide useful examples to study epigenetic chromatin regulation by non-coding RNAs. PMID:21956937
[Analysis of mitochondrial SNPs in addition to conventional STR-typing in a case of aggravated theft].

PubMed

Röper, Andrea; Reichert, Walter; Mattern, Rainer

2007-01-01

In the field of forensic DNA typing, the analysis of Short Tandem Repeats (STRs) can fail in cases of degraded DNA. The typing of coding region Single Nucleotide Polymorphisms (SNPs) of the mitochondrial genome provides an approach to acquire additional information. In the examined case of aggravated theft, both suspects could be excluded of having left the analyzed hair on the crime scene by SNP typing. This conclusion was not possible subsequent to STR typing. SNP typing of the trace on the torch light left on the crime scene increased the likelihood for suspect no. 2 to be the origin of this trace. This finding was already indicated by STR analysis. Suspect no. 1 was excluded for being the origin of this trace by SNP typing which was also indicated by STR analysis. A limiting factor for the analysis of SNPs is the maternal inheritance of mitochondrial DNA. Individualisation is not possible. In conclusion, it can be said that in the case of traces which cause problems with conventional STR typing the supplementary analysis of coding region SNPs from the mitochondrial genome is very reasonable and greatly contributes to the refinement of analysis methods in the field of forensic genetics.
Cloning and expression of 130-kd mosquito-larvicidal delta-endotoxin gene of Bacillus thuringiensis var. Israelensis in Escherichia coli.

PubMed

Angsuthanasombat, C; Chungjatupornchai, W; Kertbundit, S; Luxananil, P; Settasatian, C; Wilairat, P; Panyim, S

1987-07-01

Five recombinant E. coli clones exhibiting toxicity to Aedes aegypti larvae were obtained from a library of 800 clones containing XbaI DNA fragments of 110 kb plasmid from B. thuringiensis var. israelensis. All the five clones (pMU 14/258/303/388/679) had the same 3.8-kb insert and encoded a major protein of 130 kDa which was highly toxic to A. aegypti larvae. Three clones (pMU 258/303/388) transcribed the 130 kD a gene in the same direction as that of lac Z promoter of pUC12 vector whereas the transcription of the other two (pMU 14/679) was in the opposite direction. A 1.9-kb fragment of the 3.8 kb insert coded for a protein of 65 kDa. Partial DNA sequence of the 3.8 kb insert, corresponding to the 5'-terminal of the 130 kDa gene, revealed a continuous reading frame, a Shine-Dalgarno sequence and a tentative 5'-regulatory region. These results demonstrated that the 3.8 kb insert is a minimal DNA fragment containing a regulatory region plus the coding sequence of the 130 kDa protein that is highly toxic to mosquito larvae.
Transcription profiling suggests that mitochondrial topoisomerase IB acts as a topological barrier and regulator of mitochondrial DNA transcription.

PubMed

Dalla Rosa, Ilaria; Zhang, Hongliang; Khiati, Salim; Wu, Xiaolin; Pommier, Yves

2017-12-08

Mitochondrial DNA (mtDNA) is essential for cell viability because it encodes subunits of the respiratory chain complexes. Mitochondrial topoisomerase IB (TOP1MT) facilitates mtDNA replication by removing DNA topological tensions produced during mtDNA transcription, but it appears to be dispensable. To test whether cells lacking TOP1MT have aberrant mtDNA transcription, we performed mitochondrial transcriptome profiling. To that end, we designed and implemented a customized tiling array, which enabled genome-wide, strand-specific, and simultaneous detection of all mitochondrial transcripts. Our technique revealed that Top1mt KO mouse cells process the mitochondrial transcripts normally but that protein-coding mitochondrial transcripts are elevated. Moreover, we found discrete long noncoding RNAs produced by H-strand transcription and encompassing the noncoding regulatory region of mtDNA in human and murine cells and tissues. Of note, these noncoding RNAs were strongly up-regulated in the absence of TOP1MT. In contrast, 7S DNA, produced by mtDNA replication, was reduced in the Top1mt KO cells. We propose that the long noncoding RNA species in the D-loop region are generated by the extension of H-strand transcripts beyond their canonical stop site and that TOP1MT acts as a topological barrier and regulator for mtDNA transcription and D-loop formation.
A genetic investigation of Korean mummies from the Joseon Dynasty.

PubMed

Kim, Na Young; Lee, Hwan Young; Park, Myung Jin; Yang, Woo Ick; Shin, Kyoung-Jin

2011-01-01

Two Korean mummies (Danwoong-mirra and Yoon-mirra) found in medieval tombs in the central region of the Korean peninsula were genetically investigated by analysis of mitochondrial DNA (mtDNA), Y-chromosomal short tandem repeat (Y-STR) and the ABO gene. Danwoong-mirra is a male child mummy and Yoon-mirra is a pregnant female mummy, dating back about 550 and 450 years, respectively. DNA was extracted from soft tissues or bones. mtDNA, Y-STR and the ABO gene were amplified using a small size amplicon strategy and were analyzed according to the criteria of ancient DNA analysis to ensure that authentic DNA typing results were obtained from these ancient samples. Analysis of mtDNA hypervariable region sequence and coding region single nucleotide polymorphism (SNP) information revealed that Danwoong-mirra and Yoon-mirra belong to the East Asian mtDNA haplogroups D4 and M7c, respectively. The Y-STRs were analyzed in the male child mummy (Danwoong-mirra) using the AmpFlSTR® Yfiler PCR Amplification Kit and an in-house Y-miniplex plus system, and could be characterized in 4 loci with small amplicon size. The analysis of ABO gene SNPs using multiplex single base extension methods revealed that the ABO blood types of Danwoong-mirra and Yoon-mirra are AO01 and AB, respectively. The small size amplicon strategy and the authentication process in the present study will be effectively applicable to future genetic analyses of various forensic and ancient samples.
Analysis of protein-coding genetic variation in 60,706 humans.

PubMed

Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

2016-08-18

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Sequences in the intergenic spacer influence RNA Pol I transcription from the human rRNA promoter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, W.M.; Sylvester, J.E.

1994-09-01

In most eucaryotic species, ribosomal genes are tandemly repeated about 100-5000 times per haploid genome. The 43 Kb human rDNA repeat consists of a 13 Kb coding region for the 18S, 5.8S, 28S ribosomal RNAs (rRNAs) and transcribed spacers separated by a 30 Kb intergenic spacer. For species such as frog, mouse and rat, sequences in the intergenic spacer other than the gene promoter have been shown to modulate transcription of the ribosomal gene. These sequences are spacer promoters, enhancers and the terminator for spacer transcription. We are addressing whether the human ribosomal gene promoter is similarly influenced. In-vitro transcriptionmore » run-off assays have revealed that the 4.5 kb region (CBE), directly upstream of the gene promoter, has cis-stimulation and trans-competition properties. This suggests that the CBE fragment contains an enhancer(s) for ribosomal gene transcription. Further experiments have shown that a fragment ({approximately}1.6 kb) within the CBE fragment also has trans-competition function. Deletion subclones of this region are being tested to delineate the exact sequences responsible for these modulating activities. Previous sequence analysis and functional studies have revealed that CBE contains regions of DNA capable of adopting alternative structures such as bent DNA, Z-DNA, and triple-stranded DNA. Whether these structures are required for modulating transcription remains to be determined as does the specific DNA-protein interaction involved.« less
α satellite DNA variation and function of the human centromere

PubMed Central

Sullivan, Lori L.; Chew, Kimberline

2017-01-01

ABSTRACT Genomic variation is a source of functional diversity that is typically studied in genic and non-coding regulatory regions. However, the extent of variation within noncoding portions of the human genome, particularly highly repetitive regions, and the functional consequences are not well understood. Satellite DNA, including α satellite DNA found at human centromeres, comprises up to 10% of the genome, but is difficult to study because its repetitive nature hinders contiguous sequence assemblies. We recently described variation within α satellite DNA that affects centromere function. On human chromosome 17 (HSA17), we showed that size and sequence polymorphisms within primary array D17Z1 are associated with chromosome aneuploidy and defective centromere architecture. However, HSA17 can counteract this instability by assembling the centromere at a second, “backup” array lacking variation. Here, we discuss our findings in a broader context of human centromere assembly, and highlight areas of future study to uncover links between genomic and epigenetic features of human centromeres. PMID:28406740
Long interspersed repeated DNA (LINE) causes polymorphism at the rat insulin 1 locus.

PubMed

Lakshmikumaran, M S; D'Ambrosio, E; Laimins, L A; Lin, D T; Furano, A V

1985-09-01

The insulin 1, but not the insulin 2, locus is polymorphic (i.e., exhibits allelic variation) in rats. Restriction enzyme analysis and hybridization studies showed that the polymorphic region is 2.2 kilobases upstream of the insulin 1 coding region and is due to the presence or absence of an approximately 2.7-kilobase repeated DNA element. DNA sequence determination showed that this DNA element is a member of a long interspersed repeated DNA family (LINE) that is highly repeated (greater than 50,000 copies) and highly transcribed in the rat. Although the presence or absence of LINE sequences at the insulin 1 locus occurs in both the homozygous and heterozygous states, LINE-containing insulin 1 alleles are more prevalent in the rat population than are alleles without LINEs. Restriction enzyme analysis of the LINE-containing alleles indicated that at least two versions of the LINE sequence may be present at the insulin 1 locus in different rats. Either repeated transposition of LINE sequences or gene conversion between the resident insulin 1 LINE and other sequences in the genome are possible explanations for this.
Fluctuations in the DNA double helix

NASA Astrophysics Data System (ADS)

Peyrard, M.; López, S. C.; Angelov, D.

2007-08-01

DNA is not the static entity suggested by the famous double helix structure. It shows large fluctuational openings, in which the bases, which contain the genetic code, are temporarily open. Therefore it is an interesting system to study the effect of nonlinearity on the physical properties of a system. A simple model for DNA, at a mesoscopic scale, can be investigated by computer simulation, in the same spirit as the original work of Fermi, Pasta and Ulam. These calculations raise fundamental questions in statistical physics because they show a temporary breaking of equipartition of energy, regions with large amplitude fluctuations being able to coexist with regions where the fluctuations are very small, even when the model is studied in the canonical ensemble. This phenomenon can be related to nonlinear excitations in the model. The ability of the model to describe the actual properties of DNA is discussed by comparing theoretical and experimental results for the probability that base pairs open an a given temperature in specific DNA sequences. These studies give us indications on the proper description of the effect of the sequence in the mesoscopic model.

Tau mRNA 3'UTR-to-CDS ratio is increased in Alzheimer disease.

PubMed

García-Escudero, Vega; Gargini, Ricardo; Martín-Maestro, Patricia; García, Esther; García-Escudero, Ramón; Avila, Jesús

2017-08-10

Neurons frequently show an imbalance in expression of the 3' untranslated region (3'UTR) relative to the coding DNA sequence (CDS) region of mature messenger RNAs (mRNA). The ratio varies among different cells or parts of the brain. The Map2 protein levels per cell depend on the 3'UTR-to-CDS ratio rather than the total mRNA amount, which suggests powerful regulation of protein expression by 3'UTR sequences. Here we found that MAPT (the microtubule-associated protein tau gene) 3'UTR levels are particularly high with respect to other genes; indeed, the 3'UTR-to-CDS ratio of MAPT is balanced in healthy brain in mouse and human. The tau protein accumulates in Alzheimer diseased brain. We nonetheless observed that the levels of RNA encoding MAPT/tau were diminished in these patients' brains. To explain this apparently contradictory result, we studied MAPT mRNA stoichiometry in coding and non-coding regions, and found that the 3'UTR-to-CDS ratio was higher in the hippocampus of Alzheimer disease patients, with higher tau protein but lower total mRNA levels. Our data indicate that changes in the 3'UTR-to-CDS ratio have a regulatory role in the disease. Future research should thus consider not only mRNA levels, but also the ratios between coding and non-coding regions. Copyright © 2017 Elsevier B.V. All rights reserved.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

PubMed Central

Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio

2004-01-01

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Single-Nucleosome Mapping of Histone Modifications in S. cerevisiae

PubMed Central

Kim, Minkyu; Buratowski, Stephen; Schreiber, Stuart L; Friedman, Nir

2005-01-01

Covalent modification of histone proteins plays a role in virtually every process on eukaryotic DNA, from transcription to DNA repair. Many different residues can be covalently modified, and it has been suggested that these modifications occur in a great number of independent, meaningful combinations. Published low-resolution microarray studies on the combinatorial complexity of histone modification patterns suffer from confounding effects caused by the averaging of modification levels over multiple nucleosomes. To overcome this problem, we used a high-resolution tiled microarray with single-nucleosome resolution to investigate the occurrence of combinations of 12 histone modifications on thousands of nucleosomes in actively growing S. cerevisiae. We found that histone modifications do not occur independently; there are roughly two groups of co-occurring modifications. One group of lysine acetylations shows a sharply defined domain of two hypo-acetylated nucleosomes, adjacent to the transcriptional start site, whose occurrence does not correlate with transcription levels. The other group consists of modifications occurring in gradients through the coding regions of genes in a pattern associated with transcription. We found no evidence for a deterministic code of many discrete states, but instead we saw blended, continuous patterns that distinguish nucleosomes at one location (e.g., promoter nucleosomes) from those at another location (e.g., over the 3′ ends of coding regions). These results are consistent with the idea of a simple, redundant histone code, in which multiple modifications share the same role. PMID:16122352
The Complete Mitochondrial Genomes of Two Octopods Cistopus chinensis and Cistopus taiwanicus: Revealing the Phylogenetic Position of the Genus Cistopus within the Order Octopoda

PubMed Central

Cheng, Rubin; Zheng, Xiaodong; Ma, Yuanyuan; Li, Qi

2013-01-01

In the present study, we determined the complete mitochondrial DNA (mtDNA) sequences of two species of Cistopus, namely C. chinensis and C. taiwanicus, and conducted a comparative mt genome analysis across the class Cephalopoda. The mtDNA length of C. chinensis and C. taiwanicus are 15706 and 15793 nucleotides with an AT content of 76.21% and 76.5%, respectively. The sequence identity of mtDNA between C. chinensis and C. taiwanicus was 88%, suggesting a close relationship. Compared with C. taiwanicus and other octopods, C. chinensis encoded two additional tRNA genes, showing a novel gene arrangement. In addition, an unusual 23 poly (A) signal structure is found in the ATP8 coding region of C. chinensis. The entire genome and each protein coding gene of the two Cistopus species displayed notable levels of AT and GC skews. Based on sliding window analysis among Octopodiformes, ND1 and DN5 were considered to be more reliable molecular beacons. Phylogenetic analyses based on the 13 protein-coding genes revealed that C. chinensis and C. taiwanicus form a monophyletic group with high statistical support, consistent with previous studies based on morphological characteristics. Our results also indicated that the phylogenetic position of the genus Cistopus is closer to Octopus than to Amphioctopus and Callistoctopus. The complete mtDNA sequence of C. chinensis and C. taiwanicus represent the first whole mt genomes in the genus Cistopus. These novel mtDNA data will be important in refining the phylogenetic relationships within Octopodiformes and enriching the resource of markers for systematic, population genetic and evolutionary biological studies of Cephalopoda. PMID:24358345
The PARTRAC code: Status and recent developments

NASA Astrophysics Data System (ADS)

Friedland, Werner; Kundrat, Pavel

Biophysical modeling is of particular value for predictions of radiation effects due to manned space missions. PARTRAC is an established tool for Monte Carlo-based simulations of radiation track structures, damage induction in cellular DNA and its repair [1]. Dedicated modules describe interactions of ionizing particles with the traversed medium, the production and reactions of reactive species, and score DNA damage determined by overlapping track structures with multi-scale chromatin models. The DNA repair module describes the repair of DNA double-strand breaks (DSB) via the non-homologous end-joining pathway; the code explicitly simulates the spatial mobility of individual DNA ends in parallel with their processing by major repair enzymes [2]. To simulate the yields and kinetics of radiation-induced chromosome aberrations, the repair module has been extended by tracking the information on the chromosome origin of ligated fragments as well as the presence of centromeres [3]. PARTRAC calculations have been benchmarked against experimental data on various biological endpoints induced by photon and ion irradiation. The calculated DNA fragment distributions after photon and ion irradiation reproduce corresponding experimental data and their dose- and LET-dependence. However, in particular for high-LET radiation many short DNA fragments are predicted below the detection limits of the measurements, so that the experiments significantly underestimate DSB yields by high-LET radiation [4]. The DNA repair module correctly describes the LET-dependent repair kinetics after (60) Co gamma-rays and different N-ion radiation qualities [2]. First calculations on the induction of chromosome aberrations have overestimated the absolute yields of dicentrics, but correctly reproduced their relative dose-dependence and the difference between gamma- and alpha particle irradiation [3]. Recent developments of the PARTRAC code include a model of hetero- vs euchromatin structures to enable accounting for variations in DNA damage yields, complexity and repair between these regions. Second, the applicability of the code to low-energy ions has been extended to full stopping by using a modified Barkas scaling of proton cross sections for ions heavier than helium. Third, ongoing studies aim at hitherto unprecedented benchmarking of the code against experiments with sub-muµm focused bunches of low-LET ions mimicking single high-LET ion tracks [5] which separate effects of damage clustering on a sub-mum scale from DNA damage complexity on a nanometer scale. Fourth, motivated by implications for the involvement of mitochondria in intercellular signaling and radiation-induced bystander effects, ongoing work extends the range of PARTRAC DNA models to radiation effects on mitochondrial DNA. The contribution will discuss the PARTRAC modules, benchmarks to experimental data, recent and ongoing developments of the code, with special attention to its implications and potential applications in radiation protection and space research. Acknowledgement. This work was partially funded by the EU (Contract FP7-249689 ‘DoReMi’). References 1. Friedland et al., Mutat. Res. 711, 28 (2011) 2. Friedland et al., Int. J. Radiat. Biol. 88, 129 (2012) 3. Friedland et al., Mutat. Res. 756, 213 (2013) 4. Alloni et al., Radiat. Res. 179, 690 (2013) 5. Schmid et al., Phys. Med. Biol. 57, 5889 (2012)
Quantitation of heteroplasmy of mtDNA sequence variants identified in a population of AD patients and controls by array-based resequencing.

PubMed

Coon, Keith D; Valla, Jon; Szelinger, Szabolics; Schneider, Lonnie E; Niedzielko, Tracy L; Brown, Kevin M; Pearson, John V; Halperin, Rebecca; Dunckley, Travis; Papassotiropoulos, Andreas; Caselli, Richard J; Reiman, Eric M; Stephan, Dietrich A

2006-08-01

The role of mitochondrial dysfunction in the pathogenesis of Alzheimer's disease (AD) has been well documented. Though evidence for the role of mitochondria in AD seems incontrovertible, the impact of mitochondrial DNA (mtDNA) mutations in AD etiology remains controversial. Though mutations in mitochondrially encoded genes have repeatedly been implicated in the pathogenesis of AD, many of these studies have been plagued by lack of replication as well as potential contamination of nuclear-encoded mitochondrial pseudogenes. To assess the role of mtDNA mutations in the pathogenesis of AD, while avoiding the pitfalls of nuclear-encoded mitochondrial pseudogenes encountered in previous investigations and showcasing the benefits of a novel resequencing technology, we sequenced the entire coding region (15,452 bp) of mtDNA from 19 extremely well-characterized AD patients and 18 age-matched, unaffected controls utilizing a new, reliable, high-throughput array-based resequencing technique, the Human MitoChip. High-throughput, array-based DNA resequencing of the entire mtDNA coding region from platelets of 37 subjects revealed the presence of 208 loci displaying a total of 917 sequence variants. There were no statistically significant differences in overall mutational burden between cases and controls, however, 265 independent sites of statistically significant change between cases and controls were identified. Changed sites were found in genes associated with complexes I (30.2%), III (3.0%), IV (33.2%), and V (9.1%) as well as tRNA (10.6%) and rRNA (14.0%). Despite their statistical significance, the subtle nature of the observed changes makes it difficult to determine whether they represent true functional variants involved in AD etiology or merely naturally occurring dissimilarity. Regardless, this study demonstrates the tremendous value of this novel mtDNA resequencing platform, which avoids the pitfalls of erroneously amplifying nuclear-encoded mtDNA pseudogenes, and our proposed analysis paradigm, which utilizes the availability of raw signal intensity values for each of the four potential alleles to facilitate quantitative estimates of mtDNA heteroplasmy. This information provides a potential new target for burgeoning diagnostics and therapeutics that could truly assist those suffering from this devastating disorder.
Genetic heterogeneity of the dnaK gene locus including transcription terminator region (TTR) in Campylobacter lari.

PubMed

Shitara, M; Tsuboi, Y; Sekizuka, T; Tazumi, A; Moorei, J E; Millar, B C; Taneike, I; Matsuda, M

2008-01-01

Nucleotide sequences of approximately 3.1 kbp consisting of the full-length open reading frame (ORF) for grpE, a non-coding (NC) region and a putative ORF for the full-length dnaK gene (1860 bp) were identified from a urease-positive thermophilic Campylobacter (UPTC) CF89-12 isolate. Then, following the construction of a new degenerate polymerase chain reaction (PCR) primer pair for amplification of the dnaK structural gene, including the transcription terminator region of C. lari isolates, the dnaK region was amplified successfully, TA-cloned and sequenced in nine C. lari isolates. The dnaK gene sequences commenced with an ATG and terminated with a TAA in all 10 isolates, including CF89-12. In addition, the putative ORFs for the dnaK gene locus from seven UPTC isolates consisted of 1860 bases, and the four urease-negative (UN) C. lari isolates included C. lari RM2100 reference strain 1866. Interestingly, different probable ribosome binding sites and hypothetically intrinsic p-independent terminator structures were identified between the seven UPTC and four UN C. lari isolates, respectively. Moreover, it is interesting to note that 20 out of a total of 28 polymorphic sites occurred among amino acid sequences of the dnaK ORF from 11 C. lari isolates, identified to be alternatively UPTC-specific or UN C. lari-specific. In the neighbour-joining tree based on the nucleotide sequence information of the dnaK gene, C. lari forms two major distinct clusters consisting of UPTC and UN C. lari isolates, respectively, with UN C. lari being more closely related to other thermophilic campylobacters than to UPTC.
A novel class of plant-specific zinc-dependent DNA-binding protein that binds to A/T-rich DNA sequences

PubMed Central

Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko

2001-01-01

Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698
The changing epitome of species identification – DNA barcoding

PubMed Central

Ajmal Ali, M.; Gyulai, Gábor; Hidvégi, Norbert; Kerti, Balázs; Al Hemaid, Fahad M.A.; Pandey, Arun K.; Lee, Joongku

2014-01-01

The discipline taxonomy (the science of naming and classifying organisms, the original bioinformatics and a basis for all biology) is fundamentally important in ensuring the quality of life of future human generation on the earth; yet over the past few decades, the teaching and research funding in taxonomy have declined because of its classical way of practice which lead the discipline many a times to a subject of opinion, and this ultimately gave birth to several problems and challenges, and therefore the taxonomist became an endangered race in the era of genomics. Now taxonomy suddenly became fashionable again due to revolutionary approaches in taxonomy called DNA barcoding (a novel technology to provide rapid, accurate, and automated species identifications using short orthologous DNA sequences). In DNA barcoding, complete data set can be obtained from a single specimen irrespective to morphological or life stage characters. The core idea of DNA barcoding is based on the fact that the highly conserved stretches of DNA, either coding or non coding regions, vary at very minor degree during the evolution within the species. Sequences suggested to be useful in DNA barcoding include cytoplasmic mitochondrial DNA (e.g. cox1) and chloroplast DNA (e.g. rbcL, trnL-F, matK, ndhF, and atpB rbcL), and nuclear DNA (ITS, and house keeping genes e.g. gapdh). The plant DNA barcoding is now transitioning the epitome of species identification; and thus, ultimately helping in the molecularization of taxonomy, a need of the hour. The ‘DNA barcodes’ show promise in providing a practical, standardized, species-level identification tool that can be used for biodiversity assessment, life history and ecological studies, forensic analysis, and many more. PMID:24955007
DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states.

PubMed

White, Eric J; Emanuelsson, Olof; Scalzo, David; Royce, Thomas; Kosak, Steven; Oakeley, Edward J; Weissman, Sherman; Gerstein, Mark; Groudine, Mark; Snyder, Michael; Schübeler, Dirk

2004-12-21

Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific sequences can change during development; however, the determinants of this dynamic process are poorly understood. To gain insights into the contribution of developmental state, genomic sequence, and transcriptional activity to replication timing, we investigated the timing of DNA replication at high resolution along an entire human chromosome (chromosome 22) in two different cell types. The pattern of replication timing was correlated with respect to annotated genes, gene expression, novel transcribed regions of unknown function, sequence composition, and cytological features. We observed that chromosome 22 contains regions of early- and late-replicating domains of 100 kb to 2 Mb, many (but not all) of which are associated with previously described chromosomal bands. In both cell types, expressed sequences are replicated earlier than nontranscribed regions. However, several highly transcribed regions replicate late. Overall, the DNA replication-timing profiles of the two different cell types are remarkably similar, with only nine regions of difference observed. In one case, this difference reflects the differential expression of an annotated gene that resides in this region. Novel transcribed regions with low coding potential exhibit a strong propensity for early DNA replication. Although the cellular function of such transcripts is poorly understood, our results suggest that their activity is linked to the replication-timing program.
Intrinsic DNA curvature in trypanosomes.

PubMed

Smircich, Pablo; El-Sayed, Najib M; Garat, Beatriz

2017-11-09

Trypanosoma cruzi and Trypanosoma brucei are protozoan parasites causing Chagas disease and African sleeping sickness, displaying unique features of cellular and molecular biology. Remarkably, no canonical signals for RNA polymerase II promoters, which drive protein coding genes transcription, have been identified so far. The secondary structure of DNA has long been recognized as a signal in biological processes and more recently, its involvement in transcription initiation in Leishmania was proposed. In order to study whether this feature is conserved in trypanosomatids, we undertook a genome wide search for intrinsic DNA curvature in T. cruzi and T. brucei. Using a region integrated intrinsic curvature (RIIC) scoring that we previously developed, a non-random distribution of sequence-dependent curvature was observed. High RIIC scores were found to be significantly correlated with transcription start sites in T. cruzi, which have been mapped in divergent switch regions, whereas in T. brucei, the high RIIC scores correlated with sites that have been involved not only in RNA polymerase II initiation but also in termination. In addition, we observed regions with high RIIC score presenting in-phase tracts of Adenines, in the subtelomeric regions of the T. brucei chromosomes that harbor the variable surface glycoproteins genes. In both T. cruzi and T. brucei genomes, a link between DNA conformational signals and gene expression was found. High sequence dependent curvature is associated with transcriptional regulation regions. High intrinsic curvature also occurs at the T. brucei chromosome subtelomeric regions where the recombination processes involved in the evasion of the immune host system take place. These findings underscore the relevance of indirect DNA readout in these ancient eukaryotes.
Bioenergetics in human evolution and disease: implications for the origins of biological complexity and the missing genetic variation of common diseases.

PubMed

Wallace, Douglas C

2013-07-19

Two major inconsistencies exist in the current neo-Darwinian evolutionary theory that random chromosomal mutations acted on by natural selection generate new species. First, natural selection does not require the evolution of ever increasing complexity, yet this is the hallmark of biology. Second, human chromosomal DNA sequence variation is predominantly either neutral or deleterious and is insufficient to provide the variation required for speciation or for predilection to common diseases. Complexity is explained by the continuous flow of energy through the biosphere that drives the accumulation of nucleic acids and information. Information then encodes complex forms. In animals, energy flow is primarily mediated by mitochondria whose maternally inherited mitochondrial DNA (mtDNA) codes for key genes for energy metabolism. In mammals, the mtDNA has a very high mutation rate, but the deleterious mutations are removed by an ovarian selection system. Hence, new mutations that subtly alter energy metabolism are continuously introduced into the species, permitting adaptation to regional differences in energy environments. Therefore, the most phenotypically significant gene variants arise in the mtDNA, are regional, and permit animals to occupy peripheral energy environments where rarer nuclear DNA (nDNA) variants can accumulate, leading to speciation. The neutralist-selectionist debate is then a consequence of mammals having two different evolutionary strategies: a fast mtDNA strategy for intra-specific radiation and a slow nDNA strategy for speciation. Furthermore, the missing genetic variation for common human diseases is primarily mtDNA variation plus regional nDNA variants, both of which have been missed by large, inter-population association studies.
The analysis of the complete mitochondrial genome of Lecanicillium muscarium (synonym Verticillium lecanii) suggests a minimum common gene organization in mtDNAs of Sordariomycetes: phylogenetic implications.

PubMed

Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A

2004-10-01

The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
Complete sequences of the highly rearranged molluscan mitochondrial genomes of the scaphopod graptacme eborea and the bivalve mytilus edulis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boore, Jeffrey L.; Medina, Monica; Rosenberg, Lewis A.

2004-01-31

We have determined the complete sequence of the mitochondrial genome of the scaphopod mollusk Graptacme eborea (Conrad, 1846) (14,492 nts) and completed the sequence of the mitochondrial genome of the bivalve mollusk Mytilus edulis Linnaeus, 1758 (16,740 nts). (The name Graptacme eborea is a revision of the species formerly known as Dentalium eboreum.) G. eborea mtDNA contains the 37 genes that are typically found and has the genes divided about evenly between the two strands, but M. edulis contains an extra trnM and is missing atp8, and has all genes on the same strand. Each has a highly rearranged genemore » order relative to each other and to all other studied mtDNAs. G. eborea mtDNA has almost no strand skew, but the coding strand of M. edulis mtDNA is very rich in G and T. This is reflected in differential codon usage patterns and even in amino acid compositions. G. eborea mtDNA has fewer non-coding nucleotides than any other mtDNA studied to date, with the largest non-coding region being only 24 nt long. Phylogenetic analysis using 2,420 aligned amino acid positions of concatenated proteins weakly supports an association of the scaphopod with gastropods to the exclusion of Bivalvia, Cephalopoda, and Polyplacophora, but is generally unable to convincingly resolve the relationships among major groups of the Lophotrochozoa, in contrast to the good resolution seen for several other major metazoan groups.« less
Identification of a New Human Adenovirus Protein Encoded by a Novel Late l-Strand Transcription Unit▿

PubMed Central

Tollefson, Ann E.; Ying, Baoling; Doronin, Konstantin; Sidor, Peter D.; Wold, William S. M.

2007-01-01

A short open reading frame named the “U exon,” located on the adenovirus (Ad) l-strand (for leftward transcription) between the early E3 region and the fiber gene, is conserved in mastadenoviruses. We have observed that Ad5 mutants with large deletions in E3 that infringe on the U exon display a mild growth defect, as well as an aberrant Ad E2 DNA-binding protein (DBP) intranuclear localization pattern and an apparent failure to organize replication centers during late infection. Mutants in which the U exon DNA is reconstructed have a reversed phenotype. Chow et al. (L. T. Chow et al., J. Mol. Biol. 134:265-303, 1979) described mRNAs initiating in the region of the U exon and spliced to downstream sequences in the late DBP mRNA leader and the DBP-coding region. We have cloned this mRNA (as cDNA) from Ad5 late mRNA; the predicted protein is 217 amino acids, initiating in the U exon and continuing in frame in the DBP leader and in the DBP-coding region but in a different reading frame from DBP. Polyclonal and monoclonal antibodies generated against the predicted U exon protein (UXP) showed that UXP is ∼24K in size by immunoblot and is a late protein. At 18 to 24 h postinfection, UXP is strongly associated with nucleoli and is found throughout the nucleus; later, UXP is associated with the periphery of replication centers, suggesting a function relevant to Ad DNA replication or RNA transcription. UXP is expressed by all four species C Ads. When expressed in transient transfections, UXP complements the aberrant DBP localization pattern of UXP-negative Ad5 mutants. Our data indicate that UXP is a previously unrecognized protein derived from a novel late l-strand transcription unit. PMID:17881437
Expression of simian virus 40 T antigen in Escherichia coli: localization of T-antigen origin DNA-binding domain to within 129 amino acids.

PubMed Central

Arthur, A K; Höss, A; Fanning, E

1988-01-01

The genomic coding sequence of the large T antigen of simian virus 40 (SV40) was cloned into an Escherichia coli expression vector by joining new restriction sites, BglII and BamHI, introduced at the intron boundaries of the gene. Full-length large T antigen, as well as deletion and amino acid substitution mutants, were inducibly expressed from the lac promoter of pUC9, albeit with different efficiencies and protein stabilities. Specific interaction with SV40 origin DNA was detected for full-length T antigen and certain mutants. Deletion mutants lacking T-antigen residues 1 to 130 and 260 to 708 retained specific origin-binding activity, demonstrating that the region between residues 131 and 259 must carry the essential binding domain for DNA-binding sites I and II. A sequence between residues 302 and 320 homologous to a metal-binding "finger" motif is therefore not required for origin-specific binding. However, substitution of serine for either of two cysteine residues in this motif caused a dramatic decrease in origin DNA-binding activity. This region, as well as other regions of the full-length protein, may thus be involved in stabilizing the DNA-binding domain and altering its preference for binding to site I or site II DNA. Images PMID:2835505
Introduction to the Natural Anticipator and the Artificial Anticipator

NASA Astrophysics Data System (ADS)

Dubois, Daniel M.

2010-11-01

This short communication deals with the introduction of the concept of anticipator, which is one who anticipates, in the framework of computing anticipatory systems. The definition of anticipation deals with the concept of program. Indeed, the word program, comes from "pro-gram" meaning "to write before" by anticipation, and means a plan for the programming of a mechanism, or a sequence of coded instructions that can be inserted into a mechanism, or a sequence of coded instructions, as genes or behavioural responses, that is part of an organism. Any natural or artificial programs are thus related to anticipatory rewriting systems, as shown in this paper. All the cells in the body, and the neurons in the brain, are programmed by the anticipatory genetic code, DNA, in a low-level language with four signs. The programs in computers are also computing anticipatory systems. It will be shown, at one hand, that the genetic code DNA is a natural anticipator. As demonstrated by Nobel laureate McClintock [8], genomes are programmed. The fundamental program deals with the DNA genetic code. The properties of the DNA consist in self-replication and self-modification. The self-replicating process leads to reproduction of the species, while the self-modifying process leads to new species or evolution and adaptation in existing ones. The genetic code DNA keeps its instructions in memory in the DNA coding molecule. The genetic code DNA is a rewriting system, from DNA coding to DNA template molecule. The DNA template molecule is a rewriting system to the Messenger RNA molecule. The information is not destroyed during the execution of the rewriting program. On the other hand, it will be demonstrated that Turing machine is an artificial anticipator. The Turing machine is a rewriting system. The head reads and writes, modifying the content of the tape. The information is destroyed during the execution of the program. This is an irreversible process. The input data are lost.
Complete genome of the cotton bacteria blight pathogen Xanthomonas citri pv. malvacearum strain MSCT

USDA-ARS?s Scientific Manuscript database

Xanthomonas citri pv. malvacearum (Xcm) is a major pathogen of Gossypium hirsutum. In this study we report the complete genome of the Xcm strain MSCT assembled from long read DNA sequencing technology. The MSCT genome is the first Xcm genome that has complete coding regions for Xcm transcriptional a...
Palindromic Genes in the Linear Mitochondrial Genome of the Nonphotosynthetic Green Alga Polytomella magna

PubMed Central

Smith, David Roy; Hua, Jimeng; Archibald, John M.; Lee, Robert W.

2013-01-01

Organelle DNA is no stranger to palindromic repeats. But never has a mitochondrial or plastid genome been described in which every coding region is part of a distinct palindromic unit. While sequencing the mitochondrial DNA of the nonphotosynthetic green alga Polytomella magna, we uncovered precisely this type of genic arrangement. The P. magna mitochondrial genome is linear and made up entirely of palindromes, each containing 1–7 unique coding regions. Consequently, every gene in the genome is duplicated and in an inverted orientation relative to its partner. And when these palindromic genes are folded into putative stem-loops, their predicted translational start sites are often positioned in the apex of the loop. Gel electrophoresis results support the linear, 28-kb monomeric conformation of the P. magna mitochondrial genome. Analyses of other Polytomella taxa suggest that palindromic mitochondrial genes were present in the ancestor of the Polytomella lineage and lost or retained to various degrees in extant species. The possible origins and consequences of this bizarre genomic architecture are discussed. PMID:23940100
Coding of DNA samples and data in the pharmaceutical industry: current practices and future directions--perspective of the I-PWG.

PubMed

Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A

2011-04-01

DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.

HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

PubMed Central

Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge

2013-01-01

This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392
Differentiation of Populus species using chloroplast single nucleotide polymorphism (SNP) markers--essential for comprehensible and reliable poplar breeding.

PubMed

Schroeder, H; Hoeltken, A M; Fladung, M

2012-03-01

Within the genus Populus several species belonging to different sections are cross-compatible. Hence, high numbers of interspecies hybrids occur naturally and, additionally, have been artificially produced in huge breeding programmes during the last 100 years. Therefore, determination of a single poplar species, used for the production of 'multi-species hybrids' is often difficult, and represents a great challenge for the use of molecular markers in species identification. Within this study, over 20 chloroplast regions, both intergenic spacers and coding regions, have been tested for their ability to differentiate different poplar species using 23 already published barcoding primer combinations and 17 newly designed primer combinations. About half of the published barcoding primers yielded amplification products, whereas the new primers designed on the basis of the total sequenced cpDNA genome of Populus trichocarpa Torr. & Gray yielded much higher amplification success. Intergenic spacers were found to be more variable than coding regions within the genus Populus. The highest discrimination power of Populus species was found in the combination of two intergenic spacers (trnG-psbK, psbK-psbl) and the coding region rpoC. In barcoding projects, the coding regions matK and rbcL are often recommended, but within the genus Populus they only show moderate variability and are not efficient in species discrimination. © 2011 German Botanical Society and The Royal Botanical Society of the Netherlands.
Decoding DNA labels by melting curve analysis using real-time PCR.

PubMed

Balog, József A; Fehér, Liliána Z; Puskás, László G

2017-12-01

Synthetic DNA has been used as an authentication code for a diverse number of applications. However, existing decoding approaches are based on either DNA sequencing or the determination of DNA length variations. Here, we present a simple alternative protocol for labeling different objects using a small number of short DNA sequences that differ in their melting points. Code amplification and decoding can be done in two steps using quantitative PCR (qPCR). To obtain a DNA barcode with high complexity, we defined 8 template groups, each having 4 different DNA templates, yielding 158 (>2.5 billion) combinations of different individual melting temperature (Tm) values and corresponding ID codes. The reproducibility and specificity of the decoding was confirmed by using the most complex template mixture, which had 32 different products in 8 groups with different Tm values. The industrial applicability of our protocol was also demonstrated by labeling a drone with an oil-based paint containing a predefined DNA code, which was then successfully decoded. The method presented here consists of a simple code system based on a small number of synthetic DNA sequences and a cost-effective, rapid decoding protocol using a few qPCR reactions, enabling a wide range of authentication applications.
The phylogenetic position of the roughskin skate Dipturus trachyderma (Krefft & Stehmann, 1975) (Rajiformes, Rajidae) inferred from the mitochondrial genome.

PubMed

Vargas-Caro, Carolina; Bustamante, Carlos; Lamilla, Julio; Bennett, Michael B; Ovenden, Jennifer R

2016-07-01

The complete mitochondrial genome of the roughskin skate Dipturus trachyderma is described from 1 455 724 sequences obtained using Illumina NGS technology. Total length of the mitogenome was 16 909 base pairs, comprising 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Phylogenetic analysis based on mtDNA revealed low genetic divergence among longnose skates, in particular, those dwelling the continental shelf and slope off the coasts of Chile and Argentina.
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.

PubMed

Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao

2017-07-01

In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Atypical epigenetic mark in an atypical location: cytosine methylation at asymmetric (CNN) sites within the body of a non-repetitive tomato gene.

PubMed

González, Rodrigo M; Ricardi, Martiniano M; Iusem, Norberto D

2011-05-20

Eukaryotic DNA methylation is one of the most studied epigenetic processes, as it results in a direct and heritable covalent modification triggered by external stimuli. In contrast to mammals, plant DNA methylation, which is stimulated by external cues exemplified by various abiotic types of stress, is often found not only at CG sites but also at CNG (N denoting A, C or T) and CNN (asymmetric) sites. A genome-wide analysis of DNA methylation in Arabidopsis has shown that CNN methylation is preferentially concentrated in transposon genes and non-coding repetitive elements. We are particularly interested in investigating the epigenetics of plant species with larger and more complex genomes than Arabidopsis, particularly with regards to the associated alterations elicited by abiotic stress. We describe the existence of CNN-methylated epialleles that span Asr1, a non-transposon, protein-coding gene from tomato plants that lacks an orthologous counterpart in Arabidopsis. In addition, to test the hypothesis of a link between epigenetics modifications and the adaptation of crop plants to abiotic stress, we exhaustively explored the cytosine methylation status in leaf Asr1 DNA, a model gene in our system, resulting from water-deficit stress conditions imposed on tomato plants. We found that drought conditions brought about removal of methyl marks at approximately 75 of the 110 asymmetric (CNN) sites analysed, concomitantly with a decrease of the repressive H3K27me3 epigenetic mark and a large induction of expression at the RNA level. When pinpointing those sites, we observed that demethylation occurred mostly in the intronic region. These results demonstrate a novel genomic distribution of CNN methylation, namely in the transcribed region of a protein-coding, non-repetitive gene, and the changes in those epigenetic marks that are caused by water stress. These findings may represent a general mechanism for the acquisition of new epialleles in somatic cells, which are pivotal for regulating gene expression in plants.
Chromatin accessibility prediction via a hybrid deep convolutional neural network.

PubMed

Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui

2018-03-01

A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
APOBEC3B cytidine deaminase targets the non-transcribed strand of tRNA genes in yeast.

PubMed

Saini, Natalie; Roberts, Steven A; Sterling, Joan F; Malc, Ewa P; Mieczkowski, Piotr A; Gordenin, Dmitry A

2017-05-01

Variations in mutation rates across the genome have been demonstrated both in model organisms and in cancers. This phenomenon is largely driven by the damage specificity of diverse mutagens and the differences in DNA repair efficiency in given genomic contexts. Here, we demonstrate that the single-strand DNA-specific cytidine deaminase APOBEC3B (A3B) damages tRNA genes at a 1000-fold higher efficiency than other non-tRNA genomic regions in budding yeast. We found that A3B-induced lesions in tRNA genes were predominantly located on the non-transcribed strand, while no transcriptional strand bias was observed in protein coding genes. Furthermore, tRNA gene mutations were exacerbated in cells where RNaseH expression was completely abolished (Δrnh1Δrnh35). These data suggest a transcription-dependent mechanism for A3B-induced tRNA gene hypermutation. Interestingly, in strains proficient in DNA repair, only 1% of the abasic sites formed upon excision of A3B-deaminated cytosines were not repaired leading to mutations in tRNA genes, while 18% of these lesions failed to be repaired in the remainder of the genome. A3B-induced mutagenesis in tRNA genes was found to be efficiently suppressed by the redundant activities of both base excision repair (BER) and the error-free DNA damage bypass pathway. On the other hand, deficiencies in BER did not have a profound effect on A3B-induced mutations in CAN1, the reporter for protein coding genes. We hypothesize that differences in the mechanisms underlying ssDNA formation at tRNA genes and other genomic loci are the key determinants of the choice of the repair pathways and consequently the efficiency of DNA damage repair in these regions. Overall, our results indicate that tRNA genes are highly susceptible to ssDNA-specific DNA damaging agents. However, increased DNA repair efficacy in tRNA genes can prevent their hypermutation and maintain both genome and proteome homeostasis. Published by Elsevier B.V.
Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

PubMed

Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

2014-07-04

Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.
Complete sequence and analysis of the mitochondrial genome of Hemiselmis andersenii CCMP644 (Cryptophyceae).

PubMed

Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M

2008-05-12

Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes-a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a approximately 20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22-336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol.
Complete Sequence and Analysis of the Mitochondrial Genome of Hemiselmis andersenii CCMP644 (Cryptophyceae)

PubMed Central

Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M

2008-01-01

Background Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes–a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. Results The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a ~20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22–336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Conclusion Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol. PMID:18474103
An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

PubMed Central

Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S

1999-01-01

A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
Complete Mitochondrial DNA Analysis of Eastern Eurasian Haplogroups Rarely Found in Populations of Northern Asia and Eastern Europe

PubMed Central

Derenko, Miroslava; Malyarchuk, Boris; Denisova, Galina; Perkova, Maria; Rogalla, Urszula; Grzybowski, Tomasz; Khusnutdinova, Elza; Dambueva, Irina; Zakharov, Ilia

2012-01-01

With the aim of uncovering all of the most basal variation in the northern Asian mitochondrial DNA (mtDNA) haplogroups, we have analyzed mtDNA control region and coding region sequence variation in 98 Altaian Kazakhs from southern Siberia and 149 Barghuts from Inner Mongolia, China. Both populations exhibit the prevalence of eastern Eurasian lineages accounting for 91.9% in Barghuts and 60.2% in Altaian Kazakhs. The strong affinity of Altaian Kazakhs and populations of northern and central Asia has been revealed, reflecting both influences of central Asian inhabitants and essential genetic interaction with the Altai region indigenous populations. Statistical analyses data demonstrate a close positioning of all Mongolic-speaking populations (Mongolians, Buryats, Khamnigans, Kalmyks as well as Barghuts studied here) and Turkic-speaking Sojots, thus suggesting their origin from a common maternal ancestral gene pool. In order to achieve a thorough coverage of DNA lineages revealed in the northern Asian matrilineal gene pool, we have completely sequenced the mtDNA of 55 samples representing haplogroups R11b, B4, B5, F2, M9, M10, M11, M13, N9a and R9c1, which were pinpointed from a massive collection (over 5000 individuals) of northern and eastern Asian, as well as European control region mtDNA sequences. Applying the newly updated mtDNA tree to the previously reported northern Asian and eastern Asian mtDNA data sets has resolved the status of the poorly classified mtDNA types and allowed us to obtain the coalescence age estimates of the nodes of interest using different calibrated rates. Our findings confirm our previous conclusion that northern Asian maternal gene pool consists of predominantly post-LGM components of eastern Asian ancestry, though some genetic lineages may have a pre-LGM/LGM origin. PMID:22363811
DNA octaplex formation with an I-motif of water-mediated A-quartets: reinterpretation of the crystal structure of d(GCGAAAGC).

PubMed

Sato, Yoshiteru; Mitomi, Kenta; Sunami, Tomoko; Kondo, Jiro; Takénaka, Akio

2006-12-01

The crystal structure of the tetragonal form of d(gcGAAAgc) has been revised and reasonably refined including the disordered residues. The two DNA strands form a base-intercalated duplex, and the four duplexes are assembled according to the crystallographic 222 symmetry to form an octaplex. In the central region, the eight strands are associated by I-motif of double A-quartets. Furthermore, eight hydrated-magnesium cations link the four duplexes to support the octaplex formation. Based on these structural features, a proposal that folding of d(GAAA)n, found in the non-coding region of genomes, into an octaplex can induce slippage during replication to facilitate length polymorphism is presented.
The 5S RNA gene minichromosome of Euplotes.

PubMed Central

Roberson, A E; Wolffe, A P; Hauser, L J; Olins, D E

1989-01-01

The macronucleus of the ciliated protozoan Euplotes eurystomus contains about 10(6) copies of a single type of 5S ribosomal RNA gene. This 5S gene DNA is only 930 bp long, is flanked by telomeres, and contains a single coding region of 120 bp which serves as a template for transcription in vivo and in vitro. The 5S gene minichromatin possesses four positioned nucleosomes and hypersensitive cleavage sites in the telomeric regions. Images PMID:2501759
mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory.

PubMed

Atkinson, Quentin D; Gray, Russell D; Drummond, Alexei J

2008-02-01

The relative timing and size of regional human population growth following our expansion from Africa remain unknown. Human mitochondrial DNA (mtDNA) diversity carries a legacy of our population history. Given a set of sequences, we can use coalescent theory to estimate past population size through time and draw inferences about human population history. However, recent work has challenged the validity of using mtDNA diversity to infer species population sizes. Here we use Bayesian coalescent inference methods, together with a global data set of 357 human mtDNA coding-region sequences, to infer human population sizes through time across 8 major geographic regions. Our estimates of relative population sizes show remarkable concordance with the contemporary regional distribution of humans across Africa, Eurasia, and the Americas, indicating that mtDNA diversity is a good predictor of population size in humans. Plots of population size through time show slow growth in sub-Saharan Africa beginning 143-193 kya, followed by a rapid expansion into Eurasia after the emergence of the first non-African mtDNA lineages 50-70 kya. Outside Africa, the earliest and fastest growth is inferred in Southern Asia approximately 52 kya, followed by a succession of growth phases in Northern and Central Asia (approximately 49 kya), Australia (approximately 48 kya), Europe (approximately 42 kya), the Middle East and North Africa (approximately 40 kya), New Guinea (approximately 39 kya), the Americas (approximately 18 kya), and a second expansion in Europe (approximately 10-15 kya). Comparisons of relative regional population sizes through time suggest that between approximately 45 and 20 kya most of humanity lived in Southern Asia. These findings not only support the use of mtDNA data for estimating human population size but also provide a unique picture of human prehistory and demonstrate the importance of Southern Asia to our recent evolutionary past.
CelF of Orpinomyces PC-2 has an intron and encodes a cellulase (CelF) containing a carbohydrate-binding module.

PubMed

Chen, Huizhong; Li, Xin-Liang; Blum, David L; Ximenes, Eduardo A; Ljungdahl, Lars G

2003-01-01

A cDNA, designated celF, encoding a cellulase (CelF) was isolated from the anaerobic fungus Orpinomyces PC-2. The open reading frame contains regions coding for a signal peptide, a carbohydrate-binding module (CBM), a linker, and a catalytic domain. The catalytic domain was homologous to those of CelA and CelC of the same fungus and to that of the Neocallimastix patriciarum CELA, but CelF lacks a docking domain, characteristic for enzymes of cellulosomes. It was also homologous to the cellobiohydrolase IIs and endoglucanases of aerobic organisms. The gene has a 111-bp intron, located within the CBM-coding region. Some biochemical properties of the purified recombinant enzyme are described.
Epigenetics of Peripheral B-Cell Differentiation and the Antibody Response

PubMed Central

Zan, Hong; Casali, Paolo

2015-01-01

Epigenetic modifications, such as histone post-translational modifications, DNA methylation, and alteration of gene expression by non-coding RNAs, including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), are heritable changes that are independent from the genomic DNA sequence. These regulate gene activities and, therefore, cellular functions. Epigenetic modifications act in concert with transcription factors and play critical roles in B cell development and differentiation, thereby modulating antibody responses to foreign- and self-antigens. Upon antigen encounter by mature B cells in the periphery, alterations of these lymphocytes epigenetic landscape are induced by the same stimuli that drive the antibody response. Such alterations instruct B cells to undergo immunoglobulin (Ig) class switch DNA recombination (CSR) and somatic hypermutation (SHM), as well as differentiation to memory B cells or long-lived plasma cells for the immune memory. Inducible histone modifications, together with DNA methylation and miRNAs modulate the transcriptome, particularly the expression of activation-induced cytidine deaminase, which is essential for CSR and SHM, and factors central to plasma cell differentiation, such as B lymphocyte-induced maturation protein-1. These inducible B cell-intrinsic epigenetic marks guide the maturation of antibody responses. Combinatorial histone modifications also function as histone codes to target CSR and, possibly, SHM machinery to the Ig loci by recruiting specific adaptors that can stabilize CSR/SHM factors. In addition, lncRNAs, such as recently reported lncRNA-CSR and an lncRNA generated through transcription of the S region that form G-quadruplex structures, are also important for CSR targeting. Epigenetic dysregulation in B cells, including the aberrant expression of non-coding RNAs and alterations of histone modifications and DNA methylation, can result in aberrant antibody responses to foreign antigens, such as those on microbial pathogens, and generation of pathogenic autoantibodies, IgE in allergic reactions, as well as B cell neoplasia. Epigenetic marks would be attractive targets for new therapeutics for autoimmune and allergic diseases, and B cell malignancies. PMID:26697022
Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

PubMed Central

Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

2015-01-01

Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID:25552301
HIV1 V3 loop hypermutability is enhanced by the guanine usage bias in the part of env gene coding for it.

PubMed

Khrustalev, Vladislav Victorovich

2009-01-01

Guanine is the most mutable nucleotide in HIV genes because of frequently occurring G to A transitions, which are caused by cytosine deamination in viral DNA minus strands catalyzed by APOBEC enzymes. Distribution of guanine between three codon positions should influence the probability for G to A mutation to be nonsynonymous (to occur in first or second codon position). We discovered that nucleotide sequences of env genes coding for third variable regions (V3 loops) of gp120 from HIV1 and HIV2 have different kinds of guanine usage biases. In the HIV1 reference strain and 100 additionally analyzed HIV1 strains the guanine usage bias in V3 loop coding regions (2G>1G>3G) should lead to elevated nonsynonymous G to A transitions occurrence rates. In the HIV2 reference strain and 100 other HIV2 strains guanine usage bias in V3 loop coding regions (3G>2G>1G) should protect V3 loops from hypermutability. According to the HIV1 and HIV2 V3 alignment, insertion of the sequence enriched with 2G (21 codons in length) occurred during the evolution of HIV1 predecessor, while insertion of the different sequence enriched with 3G (19 codons in length) occurred during the evolution of HIV2 predecessor. The higher is the level of 3G in the V3 coding region, the lower should be the immune escaping mutation occurrence rates. This hypothesis was tested in this study by comparing the guanine usage in V3 loop coding regions from HIV1 fast and slow progressors. All calculations have been performed by our algorithms "VVK In length", "VVK Dinucleotides" and "VVK Consensus" (www.barkovsky.hotmail.ru).

Next generation sequencing analysis reveals a relationship between rDNA unit diversity and locus number in Nicotiana diploids

PubMed Central

2012-01-01

Background Tandemly arranged nuclear ribosomal DNA (rDNA), encoding 18S, 5.8S and 26S ribosomal RNA (rRNA), exhibit concerted evolution, a pattern thought to result from the homogenisation of rDNA arrays. However rDNA homogeneity at the single nucleotide polymorphism (SNP) level has not been detailed in organisms with more than a few hundred copies of the rDNA unit. Here we study rDNA complexity in species with arrays consisting of thousands of units. Methods We examined homogeneity of genic (18S) and non-coding internally transcribed spacer (ITS1) regions of rDNA using Roche 454 and/or Illumina platforms in four angiosperm species, Nicotiana sylvestris, N. tomentosiformis, N. otophora and N. kawakamii. We compared the data with Southern blot hybridisation revealing the structure of intergenic spacer (IGS) sequences and with the number and distribution of rDNA loci. Results and Conclusions In all four species the intragenomic homogeneity of the 18S gene was high; a single ribotype makes up over 90% of the genes. However greater variation was observed in the ITS1 region, particularly in species with two or more rDNA loci, where >55% of rDNA units were a single ribotype, with the second most abundant variant accounted for >18% of units. IGS heterogeneity was high in all species. The increased number of ribotypes in ITS1 compared with 18S sequences may reflect rounds of incomplete homogenisation with strong selection for functional genic regions and relaxed selection on ITS1 variants. The relationship between the number of ITS1 ribotypes and the number of rDNA loci leads us to propose that rDNA evolution and complexity is influenced by locus number and/or amplification of orphaned rDNA units at new chromosomal locations. PMID:23259460
Complete genome sequence of a new bipartite begomovirus infecting fluted pumpkin (Telfairia occidentalis) plants in Cameroon.

PubMed

Leke, Walter N; Khatabi, Behnam; Fondong, Vincent N; Brown, Judith K

2016-08-01

The complete genome sequence was determined and characterized for a previously unreported bipartite begomovirus from fluted pumpkin (Telfairia occidentalis, family Cucurbitaceae) plants displaying mosaic symptoms in Cameroon. The DNA-A and DNA-B components were ~2.7 kb and ~2.6 kb in size, and the arrangement of viral coding regions on the genomic components was like those characteristic of other known bipartite begomoviruses originating in the Old World. While the DNA-A component was more closely related to that of chayote yellow mosaic virus (ChaYMV), at 78 %, the DNA-B component was more closely related to that of soybean chlorotic blotch virus (SbCBV), at 64 %. This newly discovered bipartite Old World virus is herein named telfairia mosaic virus (TelMV).
What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

NASA Astrophysics Data System (ADS)

Liebovitch, Larry

1998-03-01

The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.
Structural Relationships Between Minor and Major Proteins of Hepatitis B Surface Antigen

PubMed Central

Stibbe, Werner; Gerlich, Wolfram H.

1983-01-01

The minor glycoproteins from hepatitis B surface antigen, GP33 and GP36, contain at their carboxy-terminal part the sequence of the major protein P24. They have 55 additional amino acids at the amino-terminal part which are coded by the pre-S region of the viral DNA. Images PMID:6842680
Normal D-Region Models for Weapon Effects Code

DTIC Science & Technology

1985-09-18

ATTN: DNA REP) ATTN: DRSMI-YSO J GAMBLE ATTN: JLKS ATTN: JPTM US ARMY WHITE SANDS MISSILE RANGE ATTN: JPTP ATTN: STEWS -TE-N K CUMMINGS NATIONAL SECURITY... LAMB ATTN: G SMITH ATTN: F GILMORE ATTN: J VICKREY ATTN: M GANTSWEG ATTN: R LEADABRAND ATTN: W KARZAS ATTN: R TSUNODA ATTN: W CHESNUT a R & D ASSOCIATES
Chromatin-Specific Regulation of Mammalian rDNA Transcription by Clustered TTF-I Binding Sites

PubMed Central

Diermeier, Sarah D.; Németh, Attila; Rehli, Michael; Grummt, Ingrid; Längst, Gernot

2013-01-01

Enhancers and promoters often contain multiple binding sites for the same transcription factor, suggesting that homotypic clustering of binding sites may serve a role in transcription regulation. Here we show that clustering of binding sites for the transcription termination factor TTF-I downstream of the pre-rRNA coding region specifies transcription termination, increases the efficiency of transcription initiation and affects the three-dimensional structure of rRNA genes. On chromatin templates, but not on free rDNA, clustered binding sites promote cooperative binding of TTF-I, loading TTF-I to the downstream terminators before it binds to the rDNA promoter. Interaction of TTF-I with target sites upstream and downstream of the rDNA transcription unit connects these distal DNA elements by forming a chromatin loop between the rDNA promoter and the terminators. The results imply that clustered binding sites increase the binding affinity of transcription factors in chromatin, thus influencing the timing and strength of DNA-dependent processes. PMID:24068958
Identification of the structural mutation responsible for the dibucaine-resistant (atypical) variant form of human serum cholinesterase.

PubMed Central

McGuire, M C; Nogueira, C P; Bartels, C F; Lightstone, H; Hajra, A; Van der Spek, A F; Lockridge, O; La Du, B N

1989-01-01

A point mutation in the gene for human serum cholinesterase was identified that changes Asp-70 to Gly in the atypical form of serum cholinesterase. The mutation in nucleotide 209, which changes codon 70 from GAT to GGT, was found by sequencing a genomic clone and sequencing selected regions of DNA amplified by the polymerase chain reaction. The entire coding sequences for usual and atypical cholinesterases were compared, and no other consistent base differences were found. A polymorphic site near the C terminus of the coded region was detected, but neither allele at this locus segregated consistently with the atypical trait. The nucleotide-209 mutation was detected in all five atypical cholinesterase families examined. There was complete concordance between this mutation and serum cholinesterase phenotypes for all 14 heterozygous and 6 homozygous atypical subjects tested. The mutation causes the loss of a Sau3A1 restriction site; the resulting DNA fragment length polymorphism was verified by electrophoresis of 32P-labeled DNA restriction fragments from usual and atypical subjects. Dot-blot hybridization analysis with a 19-mer allele-specific probe to the DNA amplified by the polymerase chain reaction distinguished between the usual and atypical genotypes. We conclude that the Asp-70----Gly mutation (acidic to neutral amino acid substitution) accounts for reduced affinity of atypical cholinesterase for choline esters and that Asp-70 must be an important component of the anionic site. Heterogeneity in atypical alleles may exist, but the Asp-70 point mutation may represent an appreciable portion of the atypical gene pool. Images PMID:2915989
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).

PubMed

Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai

2014-12-01

The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
The construction of recombinant industrial yeasts free of bacterial sequences by directed gene replacement into a nonessential region of the genome.

PubMed

Xiao, W; Rank, G H

1989-03-15

The yeast SMR1 gene was used as a dominant resistance-selectable marker for industrial yeast transformation and for targeting integration of an economically important gene at the homologous ILV2 locus. A MEL1 gene, which codes for alpha-galactosidase, was inserted into a dispensable upstream region of SMR1 in vitro; different treatments of the plasmid (pWX813) prior to transformation resulted in 3' end, 5' end and replacement integrations that exhibited distinct integrant structures. One-step replacement within a nonessential region of the host genome generated a stable integration of MEL1 devoid of bacterial plasmid DNA. Using this method, we have constructed several alpha-galactosidase positive industrial Saccharomyces strains. Our study provides a general method for stable gene transfer in most industrial Saccharomyces yeasts, including those used in the baking, brewing (ale and lager), distilling, wine and sake industries, with solely nucleotide sequences of interest. The absence of bacterial DNA in the integrant structure facilitates the commercial application of recombinant DNA technology in the food and beverage industry.
Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR library

PubMed Central

Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng

2017-01-01

CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
Informational structure of genetic sequences and nature of gene splicing

NASA Astrophysics Data System (ADS)

Trifonov, E. N.

1991-10-01

Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
An algebraic hypothesis about the primeval genetic code architecture.

PubMed

Sánchez, Robersy; Grau, Ricardo

2009-09-01

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Informatic and genomic analysis of melanocyte cDNA libraries as a resource for the study of melanocyte development and function.

PubMed

Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J

2007-06-01

As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
Long interspersed repeated DNA (LINE) causes polymorphism at the rat insulin 1 locus.

PubMed Central

Lakshmikumaran, M S; D'Ambrosio, E; Laimins, L A; Lin, D T; Furano, A V

1985-01-01

The insulin 1, but not the insulin 2, locus is polymorphic (i.e., exhibits allelic variation) in rats. Restriction enzyme analysis and hybridization studies showed that the polymorphic region is 2.2 kilobases upstream of the insulin 1 coding region and is due to the presence or absence of an approximately 2.7-kilobase repeated DNA element. DNA sequence determination showed that this DNA element is a member of a long interspersed repeated DNA family (LINE) that is highly repeated (greater than 50,000 copies) and highly transcribed in the rat. Although the presence or absence of LINE sequences at the insulin 1 locus occurs in both the homozygous and heterozygous states, LINE-containing insulin 1 alleles are more prevalent in the rat population than are alleles without LINEs. Restriction enzyme analysis of the LINE-containing alleles indicated that at least two versions of the LINE sequence may be present at the insulin 1 locus in different rats. Either repeated transposition of LINE sequences or gene conversion between the resident insulin 1 LINE and other sequences in the genome are possible explanations for this. Images PMID:3016521
Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel.

PubMed

Meadows, J R S; Hiendleder, S; Kijas, J W

2011-04-01

Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920,000 ± 190,000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA.
Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel

PubMed Central

Meadows, J R S; Hiendleder, S; Kijas, J W

2011-01-01

Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920 000±190 000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA. PMID:20940734
Regulated Formation of lncRNA-DNA Hybrids Enables Faster Transcriptional Induction and Environmental Adaptation.

PubMed

Cloutier, Sara C; Wang, Siwen; Ma, Wai Kit; Al Husini, Nadra; Dhoondia, Zuzer; Ansari, Athar; Pascuzzi, Pete E; Tran, Elizabeth J

2016-02-04

Long non-coding (lnc)RNAs, once thought to merely represent noise from imprecise transcription initiation, have now emerged as major regulatory entities in all eukaryotes. In contrast to the rapidly expanding identification of individual lncRNAs, mechanistic characterization has lagged behind. Here we provide evidence that the GAL lncRNAs in the budding yeast S. cerevisiae promote transcriptional induction in trans by formation of lncRNA-DNA hybrids or R-loops. The evolutionarily conserved RNA helicase Dbp2 regulates formation of these R-loops as genomic deletion or nuclear depletion results in accumulation of these structures across the GAL cluster gene promoters and coding regions. Enhanced transcriptional induction is manifested by lncRNA-dependent displacement of the Cyc8 co-repressor and subsequent gene looping, suggesting that these lncRNAs promote induction by altering chromatin architecture. Moreover, the GAL lncRNAs confer a competitive fitness advantage to yeast cells because expression of these non-coding molecules correlates with faster adaptation in response to an environmental switch. Copyright © 2016 Elsevier Inc. All rights reserved.
Physical map location of the multicopy genes coding for ammonia monooxygenase and hydroxylamine oxidoreductase in the ammonia-oxidizing bacterium Nitrosomonas sp. strain ENI-11.

PubMed

Hirota, R; Yamagata, A; Kato, J; Kuroda, A; Ikeda, T; Takiguchi, N; Ohtake, H

2000-02-01

Pulsed-field gel electrophoresis of PmeI digests of the Nitrosomonas sp. strain ENI-11 chromosome produced four bands ranging from 1,200 to 480 kb in size. Southern hybridizations suggested that a 487-kb PmeI fragment contained two copies of the amoCAB genes, coding for ammonia monooxygenase (designated amoCAB(1) and amoCAB(2)), and three copies of the hao gene, coding for hydroxylamine oxidoreductase (hao(1), hao(2), and hao(3)). In this DNA fragment, amoCAB(1) and amoCAB(2) were about 390 kb apart, while hao(1), hao(2), and hao(3) were separated by at least about 100 kb from each other. Interestingly, hao(1) and hao(2) were located relatively close to amoCAB(1) and amoCAB(2), respectively. DNA sequence analysis revealed that hao(1) and hao(2) shared 160 identical nucleotides immediately upstream of each translation initiation codon. However, hao(3) showed only 30% nucleotide identity in the 160-bp corresponding region.
Physical Map Location of the Multicopy Genes Coding for Ammonia Monooxygenase and Hydroxylamine Oxidoreductase in the Ammonia-Oxidizing Bacterium Nitrosomonas sp. Strain ENI-11

PubMed Central

Hirota, Ryuichi; Yamagata, Akira; Kato, Junichi; Kuroda, Akio; Ikeda, Tsukasa; Takiguchi, Noboru; Ohtake, Hisao

2000-01-01

Pulsed-field gel electrophoresis of PmeI digests of the Nitrosomonas sp. strain ENI-11 chromosome produced four bands ranging from 1,200 to 480 kb in size. Southern hybridizations suggested that a 487-kb PmeI fragment contained two copies of the amoCAB genes, coding for ammonia monooxygenase (designated amoCAB1 and amoCAB2), and three copies of the hao gene, coding for hydroxylamine oxidoreductase (hao1, hao2, and hao3). In this DNA fragment, amoCAB1 and amoCAB2 were about 390 kb apart, while hao1, hao2, and hao3 were separated by at least about 100 kb from each other. Interestingly, hao1 and hao2 were located relatively close to amoCAB1 and amoCAB2, respectively. DNA sequence analysis revealed that hao1 and hao2 shared 160 identical nucleotides immediately upstream of each translation initiation codon. However, hao3 showed only 30% nucleotide identity in the 160-bp corresponding region. PMID:10633121
Mitochondrial DNA repairs double-strand breaks in yeast chromosomes.

PubMed

Ricchetti, M; Fairhead, C; Dujon, B

1999-11-04

The endosymbiotic theory for the origin of eukaryotic cells proposes that genetic information can be transferred from mitochondria to the nucleus of a cell, and genes that are probably of mitochondrial origin have been found in nuclear chromosomes. Occasionally, short or rearranged sequences homologous to mitochondrial DNA are seen in the chromosomes of different organisms including yeast, plants and humans. Here we report a mechanism by which fragments of mitochondrial DNA, in single or tandem array, are transferred to yeast chromosomes under natural conditions during the repair of double-strand breaks in haploid mitotic cells. These repair insertions originate from noncontiguous regions of the mitochondrial genome. Our analysis of the Saccharomyces cerevisiae mitochondrial genome indicates that the yeast nuclear genome does indeed contain several short sequences of mitochondrial origin which are similar in size and composition to those that repair double-strand breaks. These sequences are located predominantly in non-coding regions of the chromosomes, frequently in the vicinity of retrotransposon long terminal repeats, and appear as recent integration events. Thus, colonization of the yeast genome by mitochondrial DNA is an ongoing process.

Human brain factor 1, a new member of the fork head gene family

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murphy, D.B.; Wiese, S.; Burfeind, P.

1994-06-01

Analysis of cDNA clones that cross-hybridized with the fork head domain of the rat HNF-3 gene family revealed 10 cDNAs from human fetal brain and human testis cDNA libraries containing this highly conserved DNA-binding domain. Three of these cDNAs (HFK1, HFK2, and HFK3) were further analyzed. The cDNA HFK1 has a length of 2557 nucleotides and shows strong homology at the nucleotide level (91.2%) to brain factor 1 (BF-1) from rat. The HFK1 cDNA codes for a putative 476 amino acid protein. The homology to BF-1 from rat in the coding region at the amino acid level is 87.5%. Themore » fork head homologous region includes 111 amino acids starting at amino acid 160 and has a 97.5% homology to BF-1. Southern hybridization revealed that HFK1 is highly conserved among mammalian species and possibly birds. Northern analysis with total RNA from human tissues and poly(A)-rich RNA from mouse revealed a 3.2-kb transcript that is present in human and mouse fetal brain and in adult mouse brain. In situ hybridization with sections of mouse embryo and human fetal brain reveals that HFK1 expression is restricted to the neuronal cells in the telencepthalon, with strong expression being observed in the developing dentate gyrus and hippocampus. HFK1 was chromosomally localized by in situ hybridization to 14q12. The cDNA clones HFK2 and HFK3 were analyzed by restriction analysis and sequencing. HFK2 and HFK3 were found to be closely related but different from HFK1. Therefore, it would appear that HFK1, HFK2, HFK3, and BF-1 form a new fork head related subfamily. 33 refs., 6 figs.« less
Evidence that the Ceratobasidium-like white-thread blight and black rot fungal pathogens from persimmon and tea crops in the Brazilian Atlantic Forest agroecosystem are two distinct phylospecies.

PubMed

Ceresini, Paulo C; Costa-Souza, Elaine; Zala, Marcello; Furtado, Edson L; Souza, Nilton L

2012-04-01

The white-thread blight and black rot (WTBR) caused by basidiomycetous fungi of the genus Ceratobasidium is emerging as an important plant disease in Brazil, particularly for crop species in the Ericales such as persimmon (Diospyros kaki) and tea (Camellia sinensis). However, the species identity of the fungal pathogen associated with either of these hosts is still unclear. In this work, we used sequence variation in the internal transcribed spacer regions, including the 5.8S coding region of rDNA (ITS-5.8S rDNA), to determine the phylogenetic placement of the local white-thread-blight-associated populations of Ceratobasidium sp. from persimmon and tea, in relation to Ceratobasidium species already described world-wide. The two sister populations of Ceratobasidium sp. from persimmon and tea in the Brazilian Atlantic Forest agroecosystem most likely represent distinct species within Ceratobasidium and are also distinct from C. noxium, the etiological agent of the first description of white-thread blight disease that was reported on coffee in India. The intraspecific variation for the two Ceratobasidium sp. populations was also analyzed using three mitochondrial genes (ATP6, nad1 and nad2). As reported for other fungi, variation in nuclear and mitochondrial DNA was incongruent. Despite distinct variability in the ITS-rDNA region these two populations shared similar mitochondrial DNA haplotypes.
Molecular analysis of two genes between let-653 and let-56 in the unc-22(IV) region of Caenorhabditis elegans.

PubMed

Marra, M A; Prasad, S S; Baillie, D L

1993-01-01

A previous study of genomic organization described the identification of nine potential coding regions in 150 kb of genomic DNA from the unc-22(IV) region of Caenorhabditis elegans. In this study, we focus on the genomic organization of a small interval of 0.1 map unit bordered on the right by unc-22 and on the left by the left-hand breakpoints of the deficiencies sDf9, sDf19 and sDf65. This small interval at present contains a single mutagenically defined locus, the essential gene let-56. The cosmid C11F2 has previously been used to rescue let-56. Therefore, at least some of C11F2 must reside in the interval. In this paper, we report the characterization of two coding elements that reside on C11F2. Analysis of nucleotide sequence data obtained from cDNAs and cosmid subclones revealed that one of the coding elements closely resembles aromatic amino acid decarboxylases from several species. The other of these coding elements was found to closely resemble a human growth factor activatable Na+/H+ antiporter. Paris of oligonucleotide primers, predicted from both coding elements, have been used in PCR experiments to position these coding elements between the left breakpoint of sDf19 and the left breakpoint of sDf65, between the essential genes let-653 and let-56.
Evidence of translation efficiency adaptation of the coding regions of the bacteriophage lambda.

PubMed

Goz, Eli; Mioduser, Oriah; Diament, Alon; Tuller, Tamir

2017-08-01

Deciphering the way gene expression regulatory aspects are encoded in viral genomes is a challenging mission with ramifications related to all biomedical disciplines. Here, we aimed to understand how the evolution shapes the bacteriophage lambda genes by performing a high resolution analysis of ribosomal profiling data and gene expression related synonymous/silent information encoded in bacteriophage coding regions.We demonstrated evidence of selection for distinct compositions of synonymous codons in early and late viral genes related to the adaptation of translation efficiency to different bacteriophage developmental stages. Specifically, we showed that evolution of viral coding regions is driven, among others, by selection for codons with higher decoding rates; during the initial/progressive stages of infection the decoding rates in early/late genes were found to be superior to those in late/early genes, respectively. Moreover, we argued that selection for translation efficiency could be partially explained by adaptation to Escherichia coli tRNA pool and the fact that it can change during the bacteriophage life cycle.An analysis of additional aspects related to the expression of viral genes, such as mRNA folding and more complex/longer regulatory signals in the coding regions, is also reported. The reported conclusions are likely to be relevant also to additional viruses. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Expression of the Caulobacter heat shock gene dnaK is developmentally controlled during growth at normal temperatures.

PubMed Central

Gomes, S L; Gober, J W; Shapiro, L

1990-01-01

Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline

PubMed Central

Duret, Laurent; Cohen, Jean; Jubin, Claire; Dessen, Philippe; Goût, Jean-François; Mousset, Sylvain; Aury, Jean-Marc; Jaillon, Olivier; Noël, Benjamin; Arnaiz, Olivier; Bétermier, Mireille; Wincker, Patrick; Meyer, Eric; Sperling, Linda

2008-01-01

Ciliates are the only unicellular eukaryotes known to separate germinal and somatic functions. Diploid but silent micronuclei transmit the genetic information to the next sexual generation. Polyploid macronuclei express the genetic information from a streamlined version of the genome but are replaced at each sexual generation. The macronuclear genome of Paramecium tetraurelia was recently sequenced by a shotgun approach, providing access to the gene repertoire. The 72-Mb assembly represents a consensus sequence for the somatic DNA, which is produced after sexual events by reproducible rearrangements of the zygotic genome involving elimination of repeated sequences, precise excision of unique-copy internal eliminated sequences (IES), and amplification of the cellular genes to high copy number. We report use of the shotgun sequencing data (>106 reads representing 13× coverage of a completely homozygous clone) to evaluate variability in the somatic DNA produced by these developmental genome rearrangements. Although DNA amplification appears uniform, both of the DNA elimination processes produce sequence heterogeneity. The variability that arises from IES excision allowed identification of hundreds of putative new IESs, compared to 42 that were previously known, and revealed cases of erroneous excision of segments of coding sequences. We demonstrate that IESs in coding regions are under selective pressure to introduce premature termination of translation in case of excision failure. PMID:18256234
Variation of DNA Methylome of Zebrafish Cells under Cold Pressure

PubMed Central

Xu, Qiongqiong; Luo, Juntao; Shi, Yingdi; Li, Xiaoxia; Yan, Xiaonan; Zhang, Junfang

2016-01-01

DNA methylation is an essential epigenetic mechanism involved in multiple biological processes. However, the relationship between DNA methylation and cold acclimation remains poorly understood. In this study, Methylated DNA Immunoprecipitation Sequencing (MeDIP-seq) was performed to reveal a genome-wide methylation profile of zebrafish (Danio rerio) embryonic fibroblast cells (ZF4) and its variation under cold pressure. MeDIP-seq assay was conducted with ZF4 cells cultured at appropriate temperature of 28°C and at low temperature of 18°C for 5 (short-term) and 30 (long-term) days, respectively. Our data showed that DNA methylation level of whole genome increased after a short-term cold exposure and decreased after a long-term cold exposure. It is interesting that metabolism of folate pathway is significantly hypomethylated after short-term cold exposure, which is consistent with the increased DNA methylation level. 21% of methylation peaks were significantly altered after cold treatment. About 8% of altered DNA methylation peaks are located in promoter regions, while the majority of them are located in non-coding regions. Methylation of genes involved in multiple cold responsive biological processes were significantly affected, such as anti-oxidant system, apoptosis, development, chromatin modifying and immune system suggesting that those processes are responsive to cold stress through regulation of DNA methylation. Our data indicate the involvement of DNA methylation in cellular response to cold pressure, and put a new insight into the genome-wide epigenetic regulation under cold pressure. PMID:27494266
Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps.

PubMed

Aires-de-Sousa, João; Aires-de-Sousa, Luisa

2003-01-01

We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002
A linear mitochondrial genome of Cyclospora cayetanensis (Eimeriidae, Eucoccidiorida, Coccidiasina, Apicomplexa) suggests the ancestral start position within mitochondrial genomes of eimeriid coccidia.

PubMed

Ogedengbe, Mosun E; Qvarnstrom, Yvonne; da Silva, Alexandre J; Arrowood, Michael J; Barta, John R

2015-05-01

The near complete mitochondrial genome for Cyclospora cayetanensis is 6184 bp in length with three protein-coding genes (Cox1, Cox3, CytB) and numerous lsrDNA and ssrDNA fragments. Gene arrangements were conserved with other coccidia in the Eimeriidae, but the C. cayetanensis mitochondrial genome is not circular-mapping. Terminal transferase tailing and nested PCR completed the 5'-terminus of the genome starting with a 21 bp A/T-only region that forms a potential stem-loop. Regions homologous to the C. cayetanensis mitochondrial genome 5'-terminus are found in all eimeriid mitochondrial genomes available and suggest this may be the ancestral start of eimeriid mitochondrial genomes. Copyright © 2015 Australian Society for Parasitology Inc. All rights reserved.
Sequence differences in the diagnostic region of the cysteine protease 8 gene of Tritrichomonas foetus parasites of cats and cattle.

PubMed

Sun, Zichen; Stack, Colin; Šlapeta, Jan

2012-05-25

In order to investigate the genetic variation between Tritrichomonas foetus from bovine and feline origins, cysteine protease 8 (CP8) coding sequence was selected as the polymorphic DNA marker. Direct sequencing of CP8 coding sequence of T. foetus from four feline isolates and two bovine isolates with polymerase chain reaction successfully revealed conserved nucleotide polymorphisms between feline and bovine isolates. These results provide useful information for CP8-based molecular differentiation of T. foetus genotypes. Copyright © 2011 Elsevier B.V. All rights reserved.
New t-gap insertion-deletion-like metrics for DNA hybridization thermodynamic modeling.

PubMed

D'yachkov, Arkadii G; Macula, Anthony J; Pogozelski, Wendy K; Renz, Thomas E; Rykov, Vyacheslav V; Torney, David C

2006-05-01

We discuss the concept of t-gap block isomorphic subsequences and use it to describe new abstract string metrics that are similar to the Levenshtein insertion-deletion metric. Some of the metrics that we define can be used to model a thermodynamic distance function on single-stranded DNA sequences. Our model captures a key aspect of the nearest neighbor thermodynamic model for hybridized DNA duplexes. One version of our metric gives the maximum number of stacked pairs of hydrogen bonded nucleotide base pairs that can be present in any secondary structure in a hybridized DNA duplex without pseudoknots. Thermodynamic distance functions are important components in the construction of DNA codes, and DNA codes are important components in biomolecular computing, nanotechnology, and other biotechnical applications that employ DNA hybridization assays. We show how our new distances can be calculated by using a dynamic programming method, and we derive a Varshamov-Gilbert-like lower bound on the size of some of codes using these distance functions as constraints. We also discuss software implementation of our DNA code design methods.
Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing

PubMed Central

Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther

2015-01-01

Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256
CAPRRESI: Chimera Assembly by Plasmid Recovery and Restriction Enzyme Site Insertion.

PubMed

Santillán, Orlando; Ramírez-Romero, Miguel A; Dávila, Guillermo

2017-06-25

Here, we present chimera assembly by plasmid recovery and restriction enzyme site insertion (CAPRRESI). CAPRRESI benefits from many strengths of the original plasmid recovery method and introduces restriction enzyme digestion to ease DNA ligation reactions (required for chimera assembly). For this protocol, users clone wildtype genes into the same plasmid (pUC18 or pUC19). After the in silico selection of amino acid sequence regions where chimeras should be assembled, users obtain all the synonym DNA sequences that encode them. Ad hoc Perl scripts enable users to determine all synonym DNA sequences. After this step, another Perl script searches for restriction enzyme sites on all synonym DNA sequences. This in silico analysis is also performed using the ampicillin resistance gene (ampR) found on pUC18/19 plasmids. Users design oligonucleotides inside synonym regions to disrupt wildtype and ampR genes by PCR. After obtaining and purifying complementary DNA fragments, restriction enzyme digestion is accomplished. Chimera assembly is achieved by ligating appropriate complementary DNA fragments. pUC18/19 vectors are selected for CAPRRESI because they offer technical advantages, such as small size (2,686 base pairs), high copy number, advantageous sequencing reaction features, and commercial availability. The usage of restriction enzymes for chimera assembly eliminates the need for DNA polymerases yielding blunt-ended products. CAPRRESI is a fast and low-cost method for fusing protein-coding genes.
Structure and characterization of a cDNA clone for phenylalanine ammonia-lyase from cut-injured roots of sweet potato

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tanaka, Yoshiyuki; Matsuoka, Makoto; Yamanoto, Naoki

A cDNA clone for phenylalanine ammonia-lyase (PAL) induced in wounded sweet potato (Ipomoea batatas Lam.) root was obtained by immunoscreening a cDNA library. The protein produced in Escherichia coli cells containing the plasmid pPAL02 was indistinguishable from sweet potato PAL as judged by Ouchterlony double diffusion assays. The M{sub r} of its subunit was 77,000. The cells converted ({sup 14}C)-L-phenylalanine into ({sup 14}C)-t-cinnamic acid and PAL activity was detected in the homogenate of the cells. The activity was dependent on the presence of the pPAL02 plasmid DNA. The nucleotide sequence of the cDNA contained a 2,121-base pair (bp) open-reading framemore » capable of coding for a polypeptide with 707 amino acids (M{sub r} 77,137), a 22-bp 5{prime}-noncoding region and a 207-bp 3{prime}-noncoding region. The results suggest that the insert DNA fully encoded the amino acid sequence for sweet potato PAL that is induced by wounding. Comparison of the deduced amino acid sequence with that of a PAL cDNA fragment from Phaseolus vulgaris revealed 78.9% homology. The sequence from amino acid residues 258 to 494 was highly conserved, showing 90.7% homology.« less
Complete mitochondrial genome of Bactrocera arecae (Insecta: Tephritidae) by next-generation sequencing and molecular phylogeny of Dacini tribe

PubMed Central

Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip

2015-01-01

The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Identification and in vitro characterization of a Marek’s disease virus encoded ribonucleotide reductase

USDA-ARS?s Scientific Manuscript database

Marek’s disease virus (MDV) encodes a ribonucleotide reductase (RR), a key regulatory enzyme in the DNA synthesis pathway. The gene coding for the RR of MDV is located in the unique long (UL) region of the genome. The large subunit is encoded by UL39 (RR1) and is predicted to comprise 860 amino acid...
Identification and expression analysis of duck interleukin-17D in Riemeralla anatipestifer infection

USDA-ARS?s Scientific Manuscript database

Interleukin (IL)-17D is a proinflammatory cytokine with limited information on its biological functions. Here we provide the description of the sequence, bioactivity, and mRNA expression profile of duck IL-17D homologue. A full-length duck IL-17D (duIL-17D) cDNA with a 624-bp coding region was ident...
Haplotype combination of the bovine INSIG1 gene sequence variants and association with growth traits in Nanyang cattle.

PubMed

Sun, Jiajie; Gao, Yuan; Liu, Dong; Ma, Wei; Xue, Jing; Zhang, Chunlei; Lan, Xianyong; Lei, Chuzhao; Chen, Hong

2012-06-01

The insulin-induced gene 1 (INSIG1) gene encodes a protein that blocks proteolytic activation of sterol regulatory element binding proteins, which are transcription factors that activate genes that regulate cholesterol, fatty acid, and glucose metabolism. However, similar research for the bovine INSIG1 gene is lacking. Therefore, in this study, polymorphisms of the bovine INSIG1 gene were detected in 643 individuals from four cattle breeds by DNA pooling, forced PCR-RFLP, PCR-SSCP, and DNA sequencing methods. Only 10 novel SNPs were identified, which included four mutations in the coding region and the others in the introns. In Nanyang individuals, seven common haplotypes were identified based on four coding region SNPs. The haplotype GACT, with a frequency of 75.4%, was the most prevalent haplotypes and SNPs formed two linkage disequilibrium blocks with strong multi-allelic D' (D' = 1). Additionally, association analysis between mutations of the bovine INSIG1 gene and growth traits in Nanyang cattle at 6, 12, 18, and 24 months old was performed, and the results indicated that the polymorphisms were not significantly associated with body mass.
The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme.

PubMed Central

Burke, W D; Calalang, C C; Eickbush, T H

1987-01-01

Two classes of DNA elements interrupt a fraction of the rRNA repeats of Bombyx mori. We have analyzed by genomic blotting and sequence analysis one class of these elements which we have named R2. These elements occupy approximately 9% of the rDNA units of B. mori and appear to be homologous to the type II rDNA insertions detected in Drosophila melanogaster. Approximately 25 copies of R2 exist within the B. mori genome, of which at least 20 are located at a precise location within otherwise typical rDNA units. Nucleotide sequence analysis has revealed that the 4.2-kilobase-pair R2 element has a single large open reading frame, occupying over 82% of the total length of the element. The central region of this 1,151-amino-acid open reading frame shows homology to the reverse transcriptase enzymes found in retroviruses and certain transposable elements. Amino acid homology of this region is highest to the mobile line 1 elements of mammals, followed by the mitochondrial type II introns of fungi, and the pol gene of retroviruses. Less homology exists with transposable elements of D. melanogaster and Saccharomyces cerevisiae. Two additional regions of sequence homology between L1 and R2 elements were also found outside the reverse transcriptase region. We suggest that the R2 elements are retrotransposons that are site specific in their insertion into the genome. Such mobility would enable these elements to occupy a small fraction of the rDNA units of B. mori despite their continual elimination from the rDNA locus by sequence turnover. Images PMID:2439905
Complete mitochondrial genome of the giant African snail, Achatina fulica (Mollusca: Achatinidae): a novel location of putative control regions (CR) in the mitogenome within Pulmonate species.

PubMed

He, Zhang-Ping; Dai, Xia-Bin; Zhang, Shuai; Zhi, Ting-Ting; Lun, Zhao-Rong; Wu, Zhong-Dao; Yang, Ting-Bao

2016-01-01

The whole sequence (15,057 bp) of the mitochondrial DNA (mtDNA) of the terrestrial snail Achatina fulica (order Stylommatophora) was determined. The mitogenome, as the typical metazoan mtDNA, contains 13 protein-coding genes (PCG), 2 ribosomal RNA genes (rRNA) and 22 transfer RNA genes (tRNA). The tRNA genes include two trnS without standard secondary structure. Interestingly, among the known mitogenomes of Pulmonata species, we firstly characterized an unassigned lengthy sequence (551 bp) between the cox1 and the trnV which may be the CR for the sake of its AT bases usage bias (65.70%) and potential hairpin structure.

Interplay between DNA methylation, histone modification and chromatin remodeling in stem cells and during development.

PubMed

Ikegami, Kohta; Ohgane, Jun; Tanaka, Satoshi; Yagi, Shintaro; Shiota, Kunio

2009-01-01

Genes constitute only a small proportion of the mammalian genome, the majority of which is composed of non-genic repetitive elements including interspersed repeats and satellites. A unique feature of the mammalian genome is that there are numerous tissue-dependent, differentially methylated regions (T-DMRs) in the non-repetitive sequences, which include genes and their regulatory elements. The epigenetic status of T-DMRs varies from that of repetitive elements and constitutes the DNA methylation profile genome-wide. Since the DNA methylation profile is specific to each cell and tissue type, much like a fingerprint, it can be used as a means of identification. The formation of DNA methylation profiles is the basis for cell differentiation and development in mammals. The epigenetic status of each T-DMR is regulated by the interplay between DNA methyltransferases, histone modification enzymes, histone subtypes, non-histone nuclear proteins and non-coding RNAs. In this review, we will discuss how these epigenetic factors cooperate to establish cell- and tissue-specific DNA methylation profiles.
Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression

PubMed Central

Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang

2007-01-01

Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome. PMID:17425802
Characterization of mutations in the FOXE1 gene in a cohort of unrelated Malaysian patients with congenital hypothyroidism and thyroid dysgenesis.

PubMed

Kang, In-Nee; Musa, Maslinda; Harun, Fatimah; Junit, Sarni Mat

2010-02-01

The FOXE1 gene was screened for mutations in a cohort of 34 unrelated patients with congenital hypothyroidism, 14 of whom had thyroid dysgenesis and 18 were normal (the thyroid status for 2 patients was unknown). The entire coding region of the FOXE1 gene was PCR-amplified, then analyzed using single-stranded conformational polymorphism, followed by confirmation by direct DNA sequencing. DNA sequencing analysis revealed a heterozygous A>G transition at nucleotide position 394 in one of the patients. The nucleotide transition changed asparagine to aspartate at codon 132 in the highly conserved region of the forkhead DNA binding domain of the FOXE1 gene. This mutation was not detected in a total of 104 normal healthy individuals screened. The binding ability of the mutant FOXE1 protein to the human thyroperoxidase (TPO) promoter was slightly reduced compared with the wild-type FOXE1. The mutation also caused a 5% loss of TPO transcriptional activity.
Multiplexed direct genomic selection (MDiGS): a pooled BAC capture approach for highly accurate CNV and SNP/INDEL detection.

PubMed

Alvarado, David M; Yang, Ping; Druley, Todd E; Lovett, Michael; Gurnett, Christina A

2014-06-01

Despite declining sequencing costs, few methods are available for cost-effective single-nucleotide polymorphism (SNP), insertion/deletion (INDEL) and copy number variation (CNV) discovery in a single assay. Commercially available methods require a high investment to a specific region and are only cost-effective for large samples. Here, we introduce a novel, flexible approach for multiplexed targeted sequencing and CNV analysis of large genomic regions called multiplexed direct genomic selection (MDiGS). MDiGS combines biotinylated bacterial artificial chromosome (BAC) capture and multiplexed pooled capture for SNP/INDEL and CNV detection of 96 multiplexed samples on a single MiSeq run. MDiGS is advantageous over other methods for CNV detection because pooled sample capture and hybridization to large contiguous BAC baits reduces sample and probe hybridization variability inherent in other methods. We performed MDiGS capture for three chromosomal regions consisting of ∼ 550 kb of coding and non-coding sequence with DNA from 253 patients with congenital lower limb disorders. PITX1 nonsense and HOXC11 S191F missense mutations were identified that segregate in clubfoot families. Using a novel pooled-capture reference strategy, we identified recurrent chromosome chr17q23.1q23.2 duplications and small HOXC 5' cluster deletions (51 kb and 12 kb). Given the current interest in coding and non-coding variants in human disease, MDiGS fulfills a niche for comprehensive and low-cost evaluation of CNVs, coding, and non-coding variants across candidate regions of interest. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Complex interactions of the Eastern and Western Slavic populations with other European groups as revealed by mitochondrial DNA analysis.

PubMed

Grzybowski, Tomasz; Malyarchuk, Boris A; Derenko, Miroslava V; Perkova, Maria A; Bednarek, Jarosław; Woźniak, Marcin

2007-06-01

Mitochondrial DNA sequence variation was examined by the control region sequencing (HVS I and HVS II) and RFLP analysis of haplogroup-diagnostic coding region sites in 570 individuals from four regional populations of Poles and two Russian groups from northwestern part of the country. Additionally, sequences of complete mitochondrial genomes representing K1a1b1a subclade in Polish and Polish Roma populations have been determined. Haplogroup frequency patterns revealed in Poles and Russians are similar to those characteristic of other Europeans. However, there are several features of Slavic mtDNA pools seen on the level of regional populations which are helpful in the understanding of complex interactions of the Eastern and Western Slavic populations with other European groups. One of the most important is the presence of subhaplogroups U5b1b1, D5, Z1 and U8a with simultaneous scarcity of haplogroup K in populations of northwestern Russia suggesting the participation of Finno-Ugrian tribes in the formation of mtDNA pools of Russians from this region. The results of genetic structure analyses suggest that Russians from Velikii Novgorod area (northwestern Russia) and Poles from Suwalszczyzna (northeastern Poland) differ from all remaining Polish and Russian samples. Simultaneously, northwestern Russians and northeastern Poles bear some similarities to Baltic (Latvians) and Finno-Ugrian groups (Estonians) of northeastern Europe, especially on the level of U5 haplogroup frequencies. The occurrence of K1a1b1a subcluster in Poles and Polish Roma is one of the first direct proofs of the presence of Ashkenazi-specific mtDNA lineages in non-Jewish European populations.
Human ESP1/CRP2, a member of the LIM domain protein family: Characterization of the cDNA and assignment of the gene locus to chromosome 14q32.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karim, Mohammad Azharul; Ohta, Kohji; Matsuda, Ichiro

1996-01-15

The LIM domain is present in a wide variety of proteins with diverse functions and exhibits characteristic arrangements of Cys and His residues with a novel zinc-binding motif. LIM domain proteins have been implicated in development, cell regulation, and cell structure. A LIM domain protein was identified by screening a human cDNA library with rat cysteine-rich intestinal protein (CRIP) as a probe, under conditions of low stringency. Comparison of the predicted amino acid sequence with several LIM domain proteins revealed 93% of the residues to be identical to rat LIM domain protein, termed ESP1 or CRP2. Thus, the protein ismore » hereafter referred to as human ESP1/CRP2. The cDNA encompasses a 1171-base region, including 26, 624, and 521 bases in the 5{prime}-noncoding region, coding region, and 3{prime}-noncoding regions, respectively, and encodes the entire ESP1/CRP2 protein has two LIM domains, and each shares 35.1% and 77 or 79% identical residues with human cysteine-rich protein (CRP) and rat CRIP, respectively. Northern blot analysis of ESP1/CRP2 in various human tissues showed distinct tissue distributions compared with CRP and CRIP, suggesting that each might serve related but specific roles in tissue organization or function. Using a panel of human-rodent somatic cell hybrids, the ESP1/CRP2 locus was assigned to chromosome 14. Fluorescence in situ hybridization, using cDNA and a genome DNA fragment of the ESP1/CRP2 as probes, confirms this assignment and relegates regional localization to band 14q32.3 47 refs., 7 figs.« less
Nucleotide sequence and structural organization of the human vasopressin pituitary receptor (V3) gene.

PubMed

René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y

2000-01-04

In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.
Transformable Rhodobacter strains, method for producing transformable Rhodobacter strains

DOEpatents

Laible, Philip D.; Hanson, Deborah K.

2018-05-08

The invention provides an organism for expressing foreign DNA, the organism engineered to accept standard DNA carriers. The genome of the organism codes for intracytoplasmic membranes and features an interruption in at least one of the genes coding for restriction enzymes. Further provided is a system for producing biological materials comprising: selecting a vehicle to carry DNA which codes for the biological materials; determining sites on the vehicle's DNA sequence susceptible to restriction enzyme cleavage; choosing an organism to accept the vehicle based on that organism not acting upon at least one of said vehicle's sites; engineering said vehicle to contain said DNA; thereby creating a synthetic vector; and causing the synthetic vector to enter the organism so as cause expression of said DNA.
Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

NASA Astrophysics Data System (ADS)

Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

2017-07-01

DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Ultra-low background DNA cloning system.

PubMed

Goto, Kenta; Nagano, Yukio

2013-01-01

Yeast-based in vivo cloning is useful for cloning DNA fragments into plasmid vectors and is based on the ability of yeast to recombine the DNA fragments by homologous recombination. Although this method is efficient, it produces some by-products. We have developed an "ultra-low background DNA cloning system" on the basis of yeast-based in vivo cloning, by almost completely eliminating the generation of by-products and applying the method to commonly used Escherichia coli vectors, particularly those lacking yeast replication origins and carrying an ampicillin resistance gene (Amp(r)). First, we constructed a conversion cassette containing the DNA sequences in the following order: an Amp(r) 5' UTR (untranslated region) and coding region, an autonomous replication sequence and a centromere sequence from yeast, a TRP1 yeast selectable marker, and an Amp(r) 3' UTR. This cassette allowed conversion of the Amp(r)-containing vector into the yeast/E. coli shuttle vector through use of the Amp(r) sequence by homologous recombination. Furthermore, simultaneous transformation of the desired DNA fragment into yeast allowed cloning of this DNA fragment into the same vector. We rescued the plasmid vectors from all yeast transformants, and by-products containing the E. coli replication origin disappeared. Next, the rescued vectors were transformed into E. coli and the by-products containing the yeast replication origin disappeared. Thus, our method used yeast- and E. coli-specific "origins of replication" to eliminate the generation of by-products. Finally, we successfully cloned the DNA fragment into the vector with almost 100% efficiency.
Quartz crystal microbalance detection of DNA single-base mutation based on monobase-coded cadmium tellurium nanoprobe.

PubMed

Zhang, Yuqin; Lin, Fanbo; Zhang, Youyu; Li, Haitao; Zeng, Yue; Tang, Hao; Yao, Shouzhuo

2011-01-01

A new method for the detection of point mutation in DNA based on the monobase-coded cadmium tellurium nanoprobes and the quartz crystal microbalance (QCM) technique was reported. A point mutation (single-base, adenine, thymine, cytosine, and guanine, namely, A, T, C and G, mutation in DNA strand, respectively) DNA QCM sensor was fabricated by immobilizing single-base mutation DNA modified magnetic beads onto the electrode surface with an external magnetic field near the electrode. The DNA-modified magnetic beads were obtained from the biotin-avidin affinity reaction of biotinylated DNA and streptavidin-functionalized core/shell Fe(3)O(4)/Au magnetic nanoparticles, followed by a DNA hybridization reaction. Single-base coded CdTe nanoprobes (A-CdTe, T-CdTe, C-CdTe and G-CdTe, respectively) were used as the detection probes. The mutation site in DNA was distinguished by detecting the decreases of the resonance frequency of the piezoelectric quartz crystal when the coded nanoprobe was added to the test system. This proposed detection strategy for point mutation in DNA is proved to be sensitive, simple, repeatable and low-cost, consequently, it has a great potential for single nucleotide polymorphism (SNP) detection. 2011 © The Japan Society for Analytical Chemistry
A novel deletion/insertion mutation in the mRNA transcribed from one {alpha}1(I) collagen allele in a family with dominant type III OI and germline mosaicism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, O.; Masters, C.; Lewis, M.B.

1994-09-01

In an 8-year-old girl and her father, both of whom have severe type III OI, we have previously used RNA/RNA hybrid analysis to demonstrate a mismatch in the region of {alpha}1(I) mRNA coding for aa 558-861. We used SSCP to further localize the abnormality to a subregion coding for aa 579-679. This region was subcloned and sequenced. Each patient`s cDNA has a deletion of the sequences coding for the last residue of exon 34, and all of exons 35 and 36 (aa 604-639), followed by an insertion of 156 nt from the 3{prime}-end of intron 36. PCR amplification of leukocytemore » DNA from the patients and the clinically normal paternal grandmother yielded two fragments: a 1007 bp fragment predicted from normal genomic sequences and a 445 bp fragment. Subcloning and sequencing of the shorter genomic PCR product confirmed the presence of a 565 bp genomic deletion from the end of exon 34 to the middle of intron 36. The abnormal protein is apparently synthesized and incorporated into helix. The inserted nucleotides are in frame with the collagenous sequence and contain no stop codons. They encode a 52 aa non-collagenous region. The fibroblast procollagen of the patients has both normal and electrophoretically delayed pro{alpha}(I) bands. The electrophoretically delayed procollagen is very sensitive to pepsin or trypsin digestion, as predicted by its non-collagenous sequence, and cannot be visualized as collagen. This unique OI collagen mutation is an excellent candidate for molecular targeting to {open_quotes}turn off{close_quotes} a dominant mutant allele.« less
The Near Naked Hairless (HrN) Mutation Disrupts Hair Formation but is not Due to a Mutation in the Hairless Coding Region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Yutao; Das, Suchita; Olszewski, Robert Edward

Near naked hairless (HrN) is a semi-dominant mutation that arose spontaneously and was suggested by allelism testing to be an allele of mouse Hairless (Hr). HrN mice differ from other Hr mutants in that hair loss appears as the postnatal coat begins to emerge, as opposed to failure to initiate the first postnatal hair cycle, and that the mutation displays semi-dominant inheritance. We sequenced the Hr cDNA in HrN/HrN mice and characterized the pathological and molecular phenotypes to identify the basis for hair loss in this model. HrN/HrN mice exhibit dystrophic hairs that are unable to consistently emerge from themore » hair follicle, while HrN/+ mice display a sparse coat of hair and a milder degree of follicular dystrophy than their homozygous littermates. DNA microarray analysis of cutaneous gene expression demonstrates that numerous genes are downregulated in HrN/HrN mice, primarily genes important for hair structure. By contrast, Hr expression is significantly increased. Sequencing the Hr coding region, intron-exon boundaries, 5'- and 3'- UTR and immediate upstream region did not reveal the underlying mutation. Therefore HrN does not appear to be an allele of Hr but may result from a mutation in a closely linked gene or from a regulatory mutation in Hr.« less
Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny.

PubMed

Yu, Xiang-Qin; Drew, Bryan T; Yang, Jun-Bo; Gao, Lian-Ming; Li, De-Zhu

2017-01-01

Schima is an ecologically and economically important woody genus in tea family (Theaceae). Unresolved species delimitations and phylogenetic relationships within Schima limit our understanding of the genus and hinder utilization of the genus for economic purposes. In the present study, we conducted comparative analysis among the complete chloroplast (cp) genomes of 11 Schima species. Our results indicate that Schima cp genomes possess a typical quadripartite structure, with conserved genomic structure and gene order. The size of the Schima cp genome is about 157 kilo base pairs (kb). They consistently encode 114 unique genes, including 80 protein-coding genes, 30 tRNAs, and 4 rRNAs, with 17 duplicated in the inverted repeat (IR). These cp genomes are highly conserved and do not show obvious expansion or contraction of the IR region. The percent variability of the 68 coding and 93 noncoding (>150 bp) fragments is consistently less than 3%. The seven most widely touted DNA barcode regions as well as one promising barcode candidate showed low sequence divergence. Eight mutational hotspots were identified from the 11 cp genomes. These hotspots may potentially be useful as specific DNA barcodes for species identification of Schima. The 58 cpSSR loci reported here are complementary to the microsatellite markers identified from the nuclear genome, and will be leveraged for further population-level studies. Phylogenetic relationships among the 11 Schima species were resolved with strong support based on the cp genome data set, which corresponds well with the species distribution pattern. The data presented here will serve as a foundation to facilitate species identification, DNA barcoding and phylogenetic reconstructions for future exploration of Schima.
Regulation of Immunoglobulin Class-Switch Recombination: Choreography of Noncoding Transcription, Targeted DNA Deamination, and Long-Range DNA Repair

PubMed Central

Matthews, Allysia J.; Zheng, Simin; DiMenna, Lauren J.; Chaudhuri, Jayanta

2014-01-01

Upon encountering antigens, mature IgM-positive B lymphocytes undergo class-switch recombination (CSR) wherein exons encoding the default Cμ constant coding gene segment of the immunoglobulin (Ig) heavy-chain (Igh) locus are excised and replaced with a new constant gene segment (referred to as “Ch genes”, e.g., Cγ, Cε, or Cα). The B cell thereby changes from expressing IgM to one producing IgG, IgE, or IgA, with each antibody isotype having a different effector function during an immune reaction. CSR is a DNA deletional-recombination reaction that proceeds through the generation of DNA double-strand breaks (DSBs) in repetitive switch (S) sequences preceding each Ch gene and is completed by end-joining between donor Sμ and acceptor S regions. CSR is a multistep reaction requiring transcription through S regions, the DNA cytidine deaminase AID, and the participation of several general DNA repair pathways including base excision repair, mismatch repair, and classical nonhomologous end-joining. In this review, we discuss our current understanding of how transcription through S regions generates substrates for AID-mediated deamination and how AID participates not only in the initiation of CSR but also in the conversion of deaminated residues into DSBs. Additionally, we review the multiple processes that regulate AID expression and facilitate its recruitment specifically to the Ig loci, and how deregulation of AID specificity leads to oncogenic translocations. Finally, we summarize recent data on the potential role of AID in the maintenance of the pluripotent stem cell state during epigenetic reprogramming. PMID:24507154
Comparison of four polymerase chain reaction assays for the detection of Brucella spp. in clinical samples from dogs

PubMed Central

Boeri, Eduardo J.; Wanke, María M.; Madariaga, María J.; Teijeiro, María L.; Elena, Sebastian A.; Trangoni, Marcos D.

2018-01-01

Aim: This study aimed to compare the sensitivity (S), specificity (Sp), and positive likelihood ratios (LR+) of four polymerase chain reaction (PCR) assays for the detection of Brucella spp. in dog’s clinical samples. Materials and Methods: A total of 595 samples of whole blood, urine, and genital fluids were evaluated between October 2014 and November 2016. To compare PCR assays, the gold standard was defined using a combination of different serological and microbiological test. Bacterial isolation from urine and blood cultures was carried out. Serological methods such as rapid slide agglutination test, indirect enzyme-linked immunosorbent assay, agar gel immunodiffusion test, and buffered plate antigen test were performed. Four genes were evaluated: (i) The gene coding for the BCSP31 protein, (ii) the ribosomal gene coding for the 16S-23S intergenic spacer region, (iii) the gene coding for porins omp2a/omp2b, and (iv) the gene coding for the insertion sequence IS711. Results: The results obtained were as follows: (1) For the primers that amplify the gene coding for the BCSP31 protein: S: 45.64% (confidence interval [CI] 39.81-51.46), Sp: 95.62% (CI 93.13-98.12), and LR+: 10.43 (CI 6.04-18); (2) for the primers that amplify the ribosomal gene of the 16S-23S rDNA intergenic spacer region: S: 69.80% (CI 64.42-75.18), Sp: 95.62 % (CI 93.13-98.12), and LR+: 11.52 (CI 7.31-18.13); (3) for the primers that amplify the omp2a and omp2b genes: S: 39.26% (CI 33.55-44.97), Sp: 97.31% (CI 95.30-99.32), and LR+ 14.58 (CI 7.25-29.29); and (4) for the primers that amplify the insertion sequence IS711: S: 22.82% (CI 17.89 - 27.75), Sp: 99.66% (CI 98.84-100), and LR+ 67.77 (CI 9.47-484.89). Conclusion: We concluded that the gene coding for the 16S-23S rDNA intergenic spacer region was the one that best detected Brucella spp. in canine clinical samples. PMID:29657404
Evaluation of vector-primed cDNA library production from microgram quantities of total RNA.

PubMed

Kuo, Jonathan; Inman, Jason; Brownstein, Michael; Usdin, Ted B

2004-12-15

cDNA sequences are important for defining the coding region of genes, and full-length cDNA clones have proven to be useful for investigation of the function of gene products. We produced cDNA libraries containing 3.5-5 x 10(5) primary transformants, starting with 5 mug of total RNA prepared from mouse pituitary, adrenal, thymus, and pineal tissue, using a vector-primed cDNA synthesis method. Of approximately 1000 clones sequenced, approximately 20% contained the full open reading frames (ORFs) of known transcripts, based on the presence of the initiating methionine residue codon. The libraries were complex, with 94, 91, 83 and 55% of the clones from the thymus, adrenal, pineal and pituitary libraries, respectively, represented only once. Twenty-five full-length clones, not yet represented in the Mammalian Gene Collection, were identified. Thus, we have produced useful cDNA libraries for the isolation of full-length cDNA clones that are not yet available in the public domain, and demonstrated the utility of a simple method for making high-quality libraries from small amounts of starting material.
Genome-wide DNA methylation patterns in LSH mutant reveals de-repression of repeat elements and redundant epigenetic silencing pathways

PubMed Central

Yu, Weishi; McIntosh, Carl; Lister, Ryan; Zhu, Iris; Han, Yixing; Ren, Jianke; Landsman, David; Lee, Eunice; Briones, Victorino; Terashima, Minoru; Leighty, Robert; Ecker, Joseph R.

2014-01-01

Cytosine methylation is critical in mammalian development and plays a role in diverse biologic processes such as genomic imprinting, X chromosome inactivation, and silencing of repeat elements. Several factors regulate DNA methylation in early embryogenesis, but their precise role in the establishment of DNA methylation at a given site remains unclear. We have generated a comprehensive methylation map in fibroblasts derived from the murine DNA methylation mutant Hells−/− (helicase, lymphoid specific, also known as LSH). It has been previously shown that HELLS can influence de novo methylation of retroviral sequences and endogenous genes. Here, we describe that HELLS controls cytosine methylation in a nuclear compartment that is in part defined by lamin B1 attachment regions. Despite widespread loss of cytosine methylation at regulatory sequences, including promoter regions of protein-coding genes and noncoding RNA genes, overall relative transcript abundance levels in the absence of HELLS are similar to those in wild-type cells. A subset of promoter regions shows increases of the histone modification H3K27me3, suggesting redundancy of epigenetic silencing mechanisms. Furthermore, HELLS modulates CG methylation at all classes of repeat elements and is critical for repression of a subset of repeat elements. Overall, we provide a detailed analysis of gene expression changes in relation to DNA methylation alterations, which contributes to our understanding of the biological role of cytosine methylation. PMID:25170028
Evidence that the Ceratobasidium-like white-thread blight and black rot fungal pathogens from persimmon and tea crops in the Brazilian Atlantic Forest agroecosystem are two distinct phylospecies

PubMed Central

Ceresini, Paulo C.; Costa-Souza, Elaine; Zala, Marcello; Furtado, Edson L.; Souza, Nilton L.

2012-01-01

The white-thread blight and black rot (WTBR) caused by basidiomycetous fungi of the genus Ceratobasidium is emerging as an important plant disease in Brazil, particularly for crop species in the Ericales such as persimmon (Diospyros kaki) and tea (Camellia sinensis). However, the species identity of the fungal pathogen associated with either of these hosts is still unclear. In this work, we used sequence variation in the internal transcribed spacer regions, including the 5.8S coding region of rDNA (ITS-5.8S rDNA), to determine the phylogenetic placement of the local white-thread-blight-associated populations of Ceratobasidium sp. from persimmon and tea, in relation to Ceratobasidium species already described world-wide. The two sister populations of Ceratobasidium sp. from persimmon and tea in the Brazilian Atlantic Forest agroecosystem most likely represent distinct species within Ceratobasidium and are also distinct from C. noxium, the etiological agent of the first description of white-thread blight disease that was reported on coffee in India. The intraspecific variation for the two Ceratobasidium sp. populations was also analyzed using three mitochondrial genes (ATP6, nad1 and nad2). As reported for other fungi, variation in nuclear and mitochondrial DNA was incongruent. Despite distinct variability in the ITS-rDNA region these two populations shared similar mitochondrial DNA haplotypes. PMID:22888299
Molecular cloning, characterization and mRNA expression of duck interleukin-17F

USDA-ARS?s Scientific Manuscript database

Interleukin-17F (IL-17F) is a proinflammatory cytokine that plays an important role in gut homeostasis. A full-length duck IL-17F (duIL-17F) cDNA with a 501-bp coding region was identified in ConA-activated splenic lymphocytes. duIL-17F is predicted to encode 166 amino acids, including a 26-amino ...

Chicken IL-17F: Identification and comparative expression analysis in Eimeria-Infected chickens

USDA-ARS?s Scientific Manuscript database

Interleukin-17F (IL-17F), belonging to the IL-17 family, is a proinflammatory cytokine and plays an important role in gut homeostasis. A full-length chicken IL-17F (chIL-17F) cDNA with a 510-bp coding region was first identified from ConA-activated splenic lymphocytes of chickens. The chIL-17F share...
The complete chloroplast genome sequence of Dendrobium officinale.

PubMed

Yang, Pei; Zhou, Hong; Qian, Jun; Xu, Haibin; Shao, Qingsong; Li, Yonghua; Yao, Hui

2016-01-01

The complete chloroplast sequence of Dendrobium officinale, an endangered and economically important traditional Chinese medicine, was reported and characterized. The genome size is 152,018 bp, with 37.5% GC content. A pair of inverted repeats (IRs) of 26,284 bp are separated by a large single-copy region (LSC, 84,944 bp) and a small single-copy region (SSC, 14,506 bp). The complete cp DNA contains 83 protein-coding genes, 39 tRNA genes and 8 rRNA genes. Fourteen genes contained one or two introns.
The minisatellite of the GPI/AMF/NLK/MF gene: interspecies conservation and transcriptional activity.

PubMed

Williams, R R; Hassan-Walker, A F; Lavender, F L; Morgan, M; Faik, P; Ragoussis, J

2001-05-16

Minisatellites are tandemly repeated DNA sequences found throughout the genomes of all eukaryotes. They are regions often prone to instability and hence hypervariability; thus repeat unit sequence is generally not conserved beyond closely related species. We have studied the minisatellite located in intron 9 of the human glucose phosphate isomerase (GPI) gene (also known as neuroleukin, autocrine motility factor, maturation and differentiation factor) and have found, by Zoo blotting coupled with PCR amplification and DNA sequencing, that similar repeat units are present in seven other species of mammal. There is also evidence for the presence of the minisatellite in chicken. The repeat unit does not appear to be present at any other locus in these genomes. Minisatellite DNA has been reported to be involved in recombination activity, control of gene expression of nearby gene(s) (both transcriptional and translational), whilst others form protein coding regions. The high level of conservation exhibited by the GPI minisatellite, coupled with the unique location, strongly suggests a functional role. Our results from transient and stable transfections using luciferase reporter constructs have shown that the GPI minisatellite region can act to increase transcription from the SV40 promoter, CMV promoter and the human GPI promoter.
Genomics dataset of unidentified disclosed isolates.

PubMed

Rekadwad, Bhagwan N

2016-09-01

Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Genomic clones for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kott, M.; Venta, P.J.; Larsen, J.

1987-05-01

A human genomic library was prepared from peripheral white blood cells from a single donor by inserting an MboI partial digest into BamHI poly-linker sites of EMBL3. This library was screened using an oligolabeled human cholinesterase cDNA probe over 700 bp long. The latter probe was obtained from a human basal ganglia cDNA library. Of approximately 2 million clones screened with high stringency conditions several positive clones were identified; two have been plaque purified. One of these clones has been partially mapped using restriction enzymes known to cut within the coded region of the cDNA for human serum cholinesterase. Hybridizationmore » of the fragments and their sizes are as expected if the genomic clone is cholinesterase. Sequencing of the DNA fragments in M13 is in progress to verify the identify of the clone and the location of introns.« less
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Self-organizing approach for meta-genomes.

PubMed

Zhu, Jianfeng; Zheng, Wei-Mou

2014-12-01

We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.
Sequence characterization of cDNA sequence of encoding of an antimicrobial Peptide with no disulfide bridge from the Iranian mesobuthus eupeus venomous glands.

PubMed

Farajzadeh-Sheikh, Ahmad; Jolodar, Abbas; Ghaemmaghami, Shamsedin

2013-01-01

Scorpion venom glands produce some antimicrobial peptides (AMP) that can rapidly kill a broad range of microbes and have additional activities that impact on the quality and effectiveness of innate responses and inflammation. In this study, we reported the identification of a cDNA sequence encoding cysteine-free antimicrobial peptides isolated from venomous glands of this species. Total RNA was extracted from the Iranian mesobuthus eupeus venom glands, and cDNA was synthesized by using the modified oligo (dT). The cDNA was used as the template for applying Semi-nested RT- PCR technique. PCR Products were used for direct nucleotide sequencing and the results were compared with Gen Bank database. A 213 BP cDNA fragment encoding the entire coding region of an antimicrobial toxin from the Iranian scorpion M. Eupeus venom glands were isolated. The full-length sequence of the coding region was 210 BP contained an open reading frame of 70 amino with a predicted molecular mass of 7970.48 Da and theoretical Pi of 9.10. The open reading frame consists of 210 BP encoding a precursor of 70 amino acid residues, including a signal peptide of 23 residues a propertied of 7 residues, and a mature peptide of 34 residues with no disulfide bridge. The peptide has detectable sequence identity to the Lesser Asian mesobuthus eupeus MeVAMP-2 (98%), MeVAMP-9 (60%) and several previously described AMPs from other scorpion venoms including mesobuthus martensii (94%) and buthus occitanus Israelis (82%). The secondary structure of the peptide mainly consisted of α-helical structure which was generally conserved by previously reported scorpion counterparts. The phylogenetic analysis showed that the Iranian MeAMP-like toxin was similar but not identical with that of venom antimicrobial peptides from lesser Asian scorpion mesobuthus eupeus.
WE-H-BRA-08: A Monte Carlo Cell Nucleus Model for Assessing Cell Survival Probability Based On Particle Track Structure Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, B; Georgia Institute of Technology, Atlanta, GA; Wang, C

Purpose: To correlate the damage produced by particles of different types and qualities to cell survival on the basis of nanodosimetric analysis and advanced DNA structures in the cell nucleus. Methods: A Monte Carlo code was developed to simulate subnuclear DNA chromatin fibers (CFs) of 30nm utilizing a mean-free-path approach common to radiation transport. The cell nucleus was modeled as a spherical region containing 6000 chromatin-dense domains (CDs) of 400nm diameter, with additional CFs modeled in a sparser interchromatin region. The Geant4-DNA code was utilized to produce a particle track database representing various particles at different energies and dose quantities.more » These tracks were used to stochastically position the DNA structures based on their mean free path to interaction with CFs. Excitation and ionization events intersecting CFs were analyzed using the DBSCAN clustering algorithm for assessment of the likelihood of producing DSBs. Simulated DSBs were then assessed based on their proximity to one another for a probability of inducing cell death. Results: Variations in energy deposition to chromatin fibers match expectations based on differences in particle track structure. The quality of damage to CFs based on different particle types indicate more severe damage by high-LET radiation than low-LET radiation of identical particles. In addition, the model indicates more severe damage by protons than of alpha particles of same LET, which is consistent with differences in their track structure. Cell survival curves have been produced showing the L-Q behavior of sparsely ionizing radiation. Conclusion: Initial results indicate the feasibility of producing cell survival curves based on the Monte Carlo cell nucleus method. Accurate correlation between simulated DNA damage to cell survival on the basis of nanodosimetric analysis can provide insight into the biological responses to various radiation types. Current efforts are directed at producing cell survival curves for high-LET radiation.« less
cDNA cloning of the human peroxisomal enoyl-CoA hydratase: 3-Hydroxyacyl-CoA dehydrogenase bifunctional enzyme and localization to chromosome 3q26. 3-3q28: A free left Alu arm is inserted in the 3[prime] noncoding region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoefler, G.; Forstner, M.; Hulla, W.

1994-01-01

Enoyl-CoA hydratase:3-hydroxyacyl-CoA dehydrogenase bifunctional enzyme is one of the four enzymes of the peroxisomal, [beta]-oxidation pathway. Here, the authors report the full-length human cDNA sequence and the localization of the corresponding gene on chromosome 3q26.3-3q28. The cDNA sequence spans 3779 nucleotides with an open reading frame of 2169 nucleotides. The tripeptide SKL at the carboxy terminus, known to serve as a peroxisomal targeting signal, is present. DNA sequence comparison of the coding region showed an 80% homology between human and rat bifunctional enzyme cDNA. The 3[prime] noncoding sequence contains 117 nucleotides homologous to an Alu repeat. Based on sequence comparison,more » they propose that these nucleotides are a free left Alu arm with 86% homology to the Alu-J family. RNA analysis shows one band with highest intensity in liver and kidney. This cDNA will allow in-depth studies of molecular defects in patients with defective peroxisomal bifunctional enzyme. Moreover, it will also provide a means for studying the regulation of peroxisomal [beta]-oxidation in humans. 33 refs., 5 figs.« less
DNA sequence requirements for the accurate transcription of a protein-coding plastid gene in a plastid in vitro system from mustard (Sinapis alba L.)

PubMed Central

Link, Gerhard

1984-01-01

A nuclease-treated plastid extract from mustard (Sinapis alba L.) allows efficient transcription of cloned plastid DNA templates. In this in vitro system, the major runoff transcript of the truncated gene for the 32 000 mol. wt. photosystem II protein was accurately initiated from a site close to or identical with the in vivo start site. By using plasmids with deletions in the 5'-flanking region of this gene as templates, a DNA region required for efficient and selective initiation was detected ˜28-35 nucleotides upstream of the transcription start site. This region contains the sequence element TTGACA, which matches the consensus sequence for prokaryotic `−35' promoter elements. In the absence of this region, a region ˜13-27 nucleotides upstream of the start site still enables a basic level of specific transcription. This second region contains the sequence element TATATAA, which matches the consensus sequence for the `TATA' box of genes transcribed by RNA polymerase II (or B). The region between the `TATA'-like element and the transcription start site is not sufficient but may be required for specific transcription of the plastid gene. This latter region contains the sequence element TATACT, which resembles the prokaryotic `−10' (Pribnow) box. Based on the structural and transcriptional features of the 5' upstream region, a `promoter switch' mechanism is proposed, which may account for the developmentally regulated expression of this plastid gene. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4.Figure 5. PMID:16453540
cDNA cloning of Brassica napus malonyl-CoA:ACP transacylase (MCAT) (fab D) and complementation of an E. coli MCAT mutant.

PubMed

Simon, J W; Slabas, A R

1998-09-18

The GenBank database was searched using the E. coli malonyl CoA:ACP transacylase (MCAT) sequence, for plant protein/cDNA sequences corresponding to MCAT, a component of plant fatty acid synthetase (FAS), for which the plant cDNA has not been isolated. A 272-bp Zea mays EST sequence (GenBank accession number: AA030706) was identified which has strong homology to the E. coli MCAT. A PCR derived cDNA probe from Zea mays was used to screen a Brassica napus (rape) cDNA library. This resulted in the isolation of a 1200-bp cDNA clone which encodes an open reading frame corresponding to a protein of 351 amino acids. The protein shows 47% homology to the E. coli MCAT amino acid sequence in the coding region for the mature protein. Expression of a plasmid (pMCATrap2) containing the plant cDNA sequence in Fab D89, an E. coli mutant, in MCAT activity restores growth demonstrating functional complementation and direct function of the cloned cDNA. This is the first functional evidence supporting the identification of a plant cDNA for MCAT.
Human immunodeficiency virus type 1 LTR TATA and TAR region sequences required for transcriptional regulation.

PubMed Central

Garcia, J A; Harrich, D; Soultanakis, E; Wu, F; Mitsuyasu, R; Gaynor, R B

1989-01-01

The human immunodeficiency virus (HIV) type 1 LTR is regulated at the transcriptional level by both cellular and viral proteins. Using HeLa cell extracts, multiple regions of the HIV LTR were found to serve as binding sites for cellular proteins. An untranslated region binding protein UBP-1 has been purified and fractions containing this protein bind to both the TAR and TATA regions. To investigate the role of cellular proteins binding to both the TATA and TAR regions and their potential interaction with other HIV DNA binding proteins, oligonucleotide-directed mutagenesis of both these regions was performed followed by DNase I footprinting and transient expression assays. In the TATA region, two direct repeats TC/AAGC/AT/AGCTGC surround the TATA sequence. Mutagenesis of both of these direct repeats or of the TATA sequence interrupted binding over the TATA region on the coding strand, but only a mutation of the TATA sequence affected in vivo assays for tat-activation. In addition to TAR serving as the site of binding of cellular proteins, RNA transcribed from TAR is capable of forming a stable stem-loop structure. To determine the relative importance of DNA binding proteins as compared to secondary structure, oligonucleotide-directed mutations in the TAR region were studied. Local mutations that disrupted either the stem or loop structure were defective in gene expression. However, compensatory mutations which restored base pairing in the stem resulted in complete tat-activation. This indicated a significant role for the stem-loop structure in HIV gene expression. To determine the role of TAR binding proteins, mutations were constructed which extensively changed the primary structure of the TAR region, yet left stem base pairing, stem energy and the loop sequence intact. These mutations resulted in decreased protein binding to TAR DNA and defects in tat-activation, and revealed factor binding specifically to the loop DNA sequence. Further mutagenesis which inverted this stem and loop mutation relative to the HIV LTR mRNA start site resulted in even larger decreases in tat-activation. This suggests that multiple determinants, including protein binding, the loop sequence, and RNA or DNA secondary structure, are important in tat-activation and suggests that tat may interact with cellular proteins binding to DNA to increase HIV gene expression. Images PMID:2721501
Cloning and characterization of the major histone H2A genes completes the cloning and sequencing of known histone genes of Tetrahymena thermophila.

PubMed Central

Liu, X; Gorovsky, M A

1996-01-01

A truncated cDNA clone encoding Tetrahymena thermophila histone H2A2 was isolated using synthetic degenerate oligonucleotide probes derived from H2A protein sequences of Tetrahymena pyriformis. The cDNA clone was used as a homologous probe to isolate a truncated genomic clone encoding H2A1. The remaining regions of the genes for H2A1 (HTA1) and H2A2 (HTA2) were then isolated using inverse PCR on circularized genomic DNA fragments. These partial clones were assembled into intact HTA1 and HTA2 clones. Nucleotide sequences of the two genes were highly homologous within the coding region but not in the noncoding regions. Comparison of the deduced amino acid sequences with protein sequences of T. pyriformis H2As showed only two and three differences respectively, in a total of 137 amino acids for H2A1, and 132 amino acids for H2A2, indicating the two genes arose before the divergence of these two species. The HTA2 gene contains a TAA triplet within the coding region, encoding a glutamine residue. In contrast with the T. thermophila HHO and HTA3 genes, no introns were identified within the two genes. The 5'- and 3'-ends of the histone H2A mRNAs; were determined by RNase protection and by PCR mapping using RACE and RLM-RACE methods. Both genes encode polyadenylated mRNAs and are highly expressed in vegetatively growing cells but only weakly expressed in starved cultures. With the inclusion of these two genes, T. thermophila is the first organism whose entire complement of known core and linker histones, including replication-dependent and basal variants, has been cloned and sequenced. PMID:8760889
Electron holes appear to trigger cancer-implicated mutations

NASA Astrophysics Data System (ADS)

Miller, John; Villagran, Martha

Malignant tumors are caused by mutations, which also affect their subsequent growth and evolution. We use a novel approach, computational DNA hole spectroscopy [M.Y. Suarez-Villagran & J.H. Miller, Sci. Rep. 5, 13571 (2015)], to compute spectra of enhanced hole probability based on actual sequence data. A hole is a mobile site of positive charge created when an electron is removed, for example by radiation or contact with a mutagenic agent. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of reveal a correlation between hole spectrum peaks and spikes in human mutation frequencies. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with cancer-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential cancer `driver' mutations. Such integration of DNA hole and variance spectra could also prove invaluable for pinpointing critical regions, and sites of driver mutations, in the vast non-protein-coding genome. Supported by the State of Texas through the Texas Ctr. for Superconductivity.
Inheritance of the complete mitochondrial genomes Cyprinus capio furong(♀) × Cyprinus carpio var.singguonensis(♂).

PubMed

Peng, Huizhen; Liu, Qiaolin; Xiao, Tiaoyi

2016-09-01

In this study, 15 sets of primers were used to amplify contiguous, overlapping segments of the complete mitochondrial DNA (mtDNA) of C. capio furong(♀) × C. carpio var.singguonensis(♂) in order to characterize and compare their mitochondrial genomes. The total length of the mitochondrial genome was 16,581 bp and deposited in the GenBank with the accession number KP210473. The organization of the mitochondrial genomes contained 37 genes (13 protein-coding genes, 2 ribosomal RNA and 22 transfer RNAs) and a major non-coding control region which was similar to those reported mitochondrial genomes. Most genes were encoded on the H-strand, except for the ND6 and 8 tRNA genes, encoding on the L-strand. The nucleotide skewness for the coding strands of C. capio furong(♀) × C. carpio var.singguonensis(♂) (AT-skew = 0.12, GC-skew = -0.27) were biased toward T and G. The complete mitogenome may provide important date for the study of genetic mechanism of C. capio furong(♀) × C. carpio var.singguonensis(♂).
Epigenetic deregulation in chronic lymphocytic leukemia: Clinical and biological impact.

PubMed

Mansouri, Larry; Wierzbinska, Justyna Anna; Plass, Christoph; Rosenquist, Richard

2018-02-07

Deregulated transcriptional control caused by aberrant DNA methylation and/or histone modifications is a hallmark of cancer cells. In chronic lymphocytic leukemia (CLL), the most common adult leukemia, the epigenetic 'landscape' has added a new layer of complexity to our understanding of this clinically and biologically heterogeneous disease. Early studies identified aberrant DNA methylation, often based on single gene promoter analysis with both biological and clinical impact. Subsequent genome-wide profiling studies revealed differential DNA methylation between CLLs and controls and in prognostics subgroups of the disease. From these studies, it became apparent that DNA methylation in regions outside of promoters, such as enhancers, is important for the regulation of coding genes as well as for the regulation of non-coding RNAs. Although DNA methylation profiles are reportedly stable over time and in relation to therapy, a higher epigenetic heterogeneity or 'burden' is seen in more aggressive CLL subgroups, albeit as non-recurrent 'passenger' events. More recently, DNA methylation profiles in CLL analyzed in relation to differentiating normal B-cell populations revealed that the majority of the CLL epigenome reflects the epigenomes present in the cell of origin and that only a small fraction of the epigenetic alterations represents truly CLL-specific changes. Furthermore, CLL patients can be grouped into at least three clinically relevant epigenetic subgroups, potentially originating from different cells at various stages of differentiation and associated with distinct outcomes. In this review, we summarize the current understanding of the DNA methylome in CLL, the role of histone modifying enzymes, highlight insights derived from animal models and attempts made to target epigenetic regulators in CLL along with the future directions of this rapidly advancing field. Copyright © 2018 Elsevier Ltd. All rights reserved.
Is a Genome a Codeword of an Error-Correcting Code?

PubMed Central

Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

2012-01-01

Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
On the path to genetic novelties: insights from programmed DNA elimination and RNA splicing.

PubMed

Catania, Francesco; Schmitz, Jürgen

2015-01-01

Understanding how genetic novelties arise is a central goal of evolutionary biology. To this end, programmed DNA elimination and RNA splicing deserve special consideration. While programmed DNA elimination reshapes genomes by eliminating chromatin during organismal development, RNA splicing rearranges genetic messages by removing intronic regions during transcription. Small RNAs help to mediate this class of sequence reorganization, which is not error-free. It is this imperfection that makes programmed DNA elimination and RNA splicing excellent candidates for generating evolutionary novelties. Leveraging a number of these two processes' mechanistic and evolutionary properties, which have been uncovered over the past years, we present recently proposed models and empirical evidence for how splicing can shape the structure of protein-coding genes in eukaryotes. We also chronicle a number of intriguing similarities between the processes of programmed DNA elimination and RNA splicing, and highlight the role that the variation in the population-genetic environment may play in shaping their target sequences. © 2015 Wiley Periodicals, Inc.
DNA transposon activity is associated with increased mutation rates in genes of rice and other grasses

PubMed Central

Wicker, Thomas; Yu, Yeisoo; Haberer, Georg; Mayer, Klaus F. X.; Marri, Pradeep Reddy; Rounsley, Steve; Chen, Mingsheng; Zuccolo, Andrea; Panaud, Olivier; Wing, Rod A.; Roffler, Stefan

2016-01-01

DNA (class 2) transposons are mobile genetic elements which move within their ‘host' genome through excising and re-inserting elsewhere. Although the rice genome contains tens of thousands of such elements, their actual role in evolution is still unclear. Analysing over 650 transposon polymorphisms in the rice species Oryza sativa and Oryza glaberrima, we find that DNA repair following transposon excisions is associated with an increased number of mutations in the sequences neighbouring the transposon. Indeed, the 3,000 bp flanking the excised transposons can contain over 10 times more mutations than the genome-wide average. Since DNA transposons preferably insert near genes, this is correlated with increases in mutation rates in coding sequences and regulatory regions. Most importantly, we find this phenomenon also in maize, wheat and barley. Thus, these findings suggest that DNA transposon activity is a major evolutionary force in grasses which provide the basis of most food consumed by humankind. PMID:27599761

GeneMachine: gene prediction and sequence annotation.

PubMed

Makalowska, I; Ryan, J F; Baxevanis, A D

2001-09-01

A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.
Data compression and genomes: a two-dimensional life domain map.

PubMed

Menconi, Giulia; Benci, Vieri; Buiatti, Marcello

2008-07-21

We define the complexity of DNA sequences as the information content per nucleotide, calculated by means of some Lempel-Ziv data compression algorithm. It is possible to use the statistics of the complexity values of the functional regions of different complete genomes to distinguish among genomes of different domains of life (Archaea, Bacteria and Eukarya). We shall focus on the distribution function of the complexity of non-coding regions. We show that the three domains may be plotted in separate regions within the two-dimensional space where the axes are the skewness coefficient and the curtosis coefficient of the aforementioned distribution. Preliminary results on 15 genomes are introduced.
Primary structure of stanniocalcin in two basal Actinopterygii.

PubMed

Amemiya, Yutaka; Youson, John H

2004-01-15

The primary structure of stanniocalcin (STC), the principal product of the corpuscles of Stannius (CS) in ray-finned fishes, was deduced from STC cDNA clones for two species of holostean, the gar, Lepisosteus osseus and the bowfin, Amia calva. Overlapping partial cDNA clones were amplified by polymerase chain reaction (PCR) from single-strand cDNA of the CS. Excluding the poly(A) tail, the cDNAs of 1863 base pairs [bp] (gar) and 914 bp (bowfin) contained the 5' untranslated region followed by the coding region and the 3' untranslated region. Both the gar and bowfin STC cDNA encode a prehormone of 252 amino acids (aa) with a signal peptide of 32 aa and a mature protein of 220 aa. The deduced aa sequence of gar STC shows 87% identity with bowfin STC, 60-72% identity with most vertebrate STCs and 26% identity with mouse STC2. Phylogenetic analysis of the sequences support a view that the gar and bowfin form a monophyletic holostean clade. RT-PCR revealed in the gar and bowfin that, just as in mammals and rainbow trout, the expression of STC mRNA is widely spread in many tissues and organs. Since the gar and bowfin are representatives of the most ancient fishes known to possess CS, the corpuscular-derived STC molecule in fish has had a conserved evolution.
Insertion of a self-splicing intron into the mtDNA of atriploblastic animal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Valles, Y.; Halanych, K.; Boore, J.L.

2006-04-14

Nephtys longosetosa is a carnivorous polychaete worm that lives in the intertidal and subtidal zones with worldwide distribution (pleijel&rouse2001). Its mitochondrial genome has the characteristics typical of most metazoans: 37 genes; circular molecule; almost no intergenic sequence; and no significant gene rearrangements when compared to other annelid mtDNAs (booremoritz19981995). Ubiquitous features as small intergenic regions and lack of introns suggested that metazoan mtDNAs are under strong selective pressures to reduce their genome size allowing for faster replication requirements (booremoritz19981995Lynch2005). Yet, in 1996 two type I introns were found in the mtDNA of the basal metazoan Metridium senile (FigureX). Breaking amore » long-standing rule (absence of introns in metazoan mtDNA), this finding was later supported by the further presence of group I introns in other cnidarians. Interestingly, only the class Anthozoa within cnidarians seems to harbor such introns. Although several hundreds of triploblastic metazoan mtDNAs have been sequenced, this study is the first evidence of mitochondrial introns in triploblastic metazoans. The cox1 gene of N. longosetosa has an intron of almost 2 kbs in length. This finding represents as well the first instance of a group II intron (anthozoans harbor group I introns) in all metazoan lineages. Opposite trends are observed within plants, fungi and protist mtDNAs, where introns (both group I and II) and other non-coding sequences are widespread. Plant, fungal and protist mtDNA structure and organization differ enormously from that of metazoan mtDNA. Both, plant and fungal mtDNA are dynamic molecules that undergo high rates of recombination, contain long intergenic spacer regions and harbor both group I and group II introns. However, as metazoans they have a conserved gene content. Protists, on the other hand have a striking variation of gene content and introns that account for the genome size variation. In contrast to this mtDNA structure and organization diversity, current genome level studies point to a monophyletic origin of the mitochondria (REFS), raising questions such as: what are the pressures at work shaping the evolution of the mitochondrial genome at 'higher' levels? What drives the absence of introns and other non-coding spacers in metazoan mtDNA? What characteristics must have an intron to be maintained in an environment where 'extra chromosomes' are usually selected against?« less
Context influences on TALE–DNA binding revealed by quantitative profiling

PubMed Central

Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.

2015-01-01

Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.

PubMed

Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L

2015-06-11

Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
Large-scale genomic analyses link reproductive ageing to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair

PubMed Central

Lunetta, Kathryn L.; Pervjakova, Natalia; Chasman, Daniel I.; Stolk, Lisette; Finucane, Hilary K.; Sulem, Patrick; Bulik-Sullivan, Brendan; Esko, Tõnu; Johnson, Andrew D.; Elks, Cathy E.; Franceschini, Nora; He, Chunyan; Altmaier, Elisabeth; Brody, Jennifer A.; Franke, Lude L.; Huffman, Jennifer E.; Keller, Margaux F.; McArdle, Patrick F.; Nutile, Teresa; Porcu, Eleonora; Robino, Antonietta; Rose, Lynda M.; Schick, Ursula M.; Smith, Jennifer A.; Teumer, Alexander; Traglia, Michela; Vuckovic, Dragana; Yao, Jie; Zhao, Wei; Albrecht, Eva; Amin, Najaf; Corre, Tanguy; Hottenga, Jouke-Jan; Mangino, Massimo; Smith, Albert V.; Tanaka, Toshiko; Abecasis, Goncalo; Andrulis, Irene L.; Anton-Culver, Hoda; Antoniou, Antonis C.; Arndt, Volker; Arnold, Alice M.; Barbieri, Caterina; Beckmann, Matthias W.; Beeghly-Fadiel, Alicia; Benitez, Javier; Bernstein, Leslie; Bielinski, Suzette J.; Blomqvist, Carl; Boerwinkle, Eric; Bogdanova, Natalia V.; Bojesen, Stig E.; Bolla, Manjeet K.; Borresen-Dale, Anne-Lise; Boutin, Thibaud S; Brauch, Hiltrud; Brenner, Hermann; Brüning, Thomas; Burwinkel, Barbara; Campbell, Archie; Campbell, Harry; Chanock, Stephen J.; Chapman, J. Ross; Chen, Yii-Der Ida; Chenevix-Trench, Georgia; Couch, Fergus J.; Coviello, Andrea D.; Cox, Angela; Czene, Kamila; Darabi, Hatef; De Vivo, Immaculata; Demerath, Ellen W.; Dennis, Joe; Devilee, Peter; Dörk, Thilo; dos-Santos-Silva, Isabel; Dunning, Alison M.; Eicher, John D.; Fasching, Peter A.; Faul, Jessica D.; Figueroa, Jonine; Flesch-Janys, Dieter; Gandin, Ilaria; Garcia, Melissa E.; García-Closas, Montserrat; Giles, Graham G.; Girotto, Giorgia G.; Goldberg, Mark S.; González-Neira, Anna; Goodarzi, Mark O.; Grove, Megan L.; Gudbjartsson, Daniel F.; Guénel, Pascal; Guo, Xiuqing; Haiman, Christopher A.; Hall, Per; Hamann, Ute; Henderson, Brian E.; Hocking, Lynne J.; Hofman, Albert; Homuth, Georg; Hooning, Maartje J.; Hopper, John L.; Hu, Frank B.; Huang, Jinyan; Humphreys, Keith; Hunter, David J.; Jakubowska, Anna; Jones, Samuel E.; Kabisch, Maria; Karasik, David; Knight, Julia A.; Kolcic, Ivana; Kooperberg, Charles; Kosma, Veli-Matti; Kriebel, Jennifer; Kristensen, Vessela; Lambrechts, Diether; Langenberg, Claudia; Li, Jingmei; Li, Xin; Lindström, Sara; Liu, Yongmei; Luan, Jian’an; Lubinski, Jan; Mägi, Reedik; Mannermaa, Arto; Manz, Judith; Margolin, Sara; Marten, Jonathan; Martin, Nicholas G.; Masciullo, Corrado; Meindl, Alfons; Michailidou, Kyriaki; Mihailov, Evelin; Milani, Lili; Milne, Roger L.; Müller-Nurasyid, Martina; Nalls, Michael; Neale, Ben M.; Nevanlinna, Heli; Neven, Patrick; Newman, Anne B.; Nordestgaard, Børge G.; Olson, Janet E.; Padmanabhan, Sandosh; Peterlongo, Paolo; Peters, Ulrike; Petersmann, Astrid; Peto, Julian; Pharoah, Paul D.P.; Pirastu, Nicola N.; Pirie, Ailith; Pistis, Giorgio; Polasek, Ozren; Porteous, David; Psaty, Bruce M.; Pylkäs, Katri; Radice, Paolo; Raffel, Leslie J.; Rivadeneira, Fernando; Rudan, Igor; Rudolph, Anja; Ruggiero, Daniela; Sala, Cinzia F.; Sanna, Serena; Sawyer, Elinor J.; Schlessinger, David; Schmidt, Marjanka K.; Schmidt, Frank; Schmutzler, Rita K.; Schoemaker, Minouk J.; Scott, Robert A.; Seynaeve, Caroline M.; Simard, Jacques; Sorice, Rossella; Southey, Melissa C.; Stöckl, Doris; Strauch, Konstantin; Swerdlow, Anthony; Taylor, Kent D.; Thorsteinsdottir, Unnur; Toland, Amanda E.; Tomlinson, Ian; Truong, Thérèse; Tryggvadottir, Laufey; Turner, Stephen T.; Vozzi, Diego; Wang, Qin; Wellons, Melissa; Willemsen, Gonneke; Wilson, James F.; Winqvist, Robert; Wolffenbuttel, Bruce B.H.R.; Wright, Alan F.; Yannoukakos, Drakoulis; Zemunik, Tatijana; Zheng, Wei; Zygmunt, Marek; Bergmann, Sven; Boomsma, Dorret I.; Buring, Julie E.; Ferrucci, Luigi; Montgomery, Grant W.; Gudnason, Vilmundur; Spector, Tim D.; van Duijn, Cornelia M; Alizadeh, Behrooz Z.; Ciullo, Marina; Crisponi, Laura; Easton, Douglas F.; Gasparini, Paolo P.; Gieger, Christian; Harris, Tamara B.; Hayward, Caroline; Kardia, Sharon L.R.; Kraft, Peter; McKnight, Barbara; Metspalu, Andres; Morrison, Alanna C.; Reiner, Alex P.; Ridker, Paul M.; Rotter, Jerome I.; Toniolo, Daniela; Uitterlinden, André G.; Ulivi, Sheila; Völzke, Henry; Wareham, Nicholas J.; Weir, David R.; Yerges-Armstrong, Laura M.; Price, Alkes L.; Stefansson, Kari; Visser, Jenny A.; Ong, Ken K.; Chang-Claude, Jenny; Murabito, Joanne M.; Perry, John R.B.; Murray, Anna

2015-01-01

Menopause timing has a substantial impact on infertility and risk of disease, including breast cancer, but the underlying mechanisms are poorly understood. We report a dual strategy in ~70,000 women to identify common and low-frequency protein-coding variation associated with age at natural menopause (ANM). We identified 44 regions with common variants, including two harbouring additional rare missense alleles of large effect. We found enrichment of signals in/near genes involved in delayed puberty, highlighting the first molecular links between the onset and end of reproductive lifespan. Pathway analyses revealed a major association with DNA damage-response (DDR) genes, including the first common coding variant in BRCA1 associated with any complex trait. Mendelian randomisation analyses supported a causal effect of later ANM on breast cancer risk (~6% risk increase per-year, P=3×10−14), likely mediated by prolonged sex hormone exposure, rather than DDR mechanisms. PMID:26414677
RNA Editing in Plant Mitochondria

NASA Astrophysics Data System (ADS)

Hiesel, Rudolf; Wissinger, Bernd; Schuster, Wolfgang; Brennicke, Axel

1989-12-01

Comparative sequence analysis of genomic and complementary DNA clones from several mitochondrial genes in the higher plant Oenothera revealed nucleotide sequence divergences between the genomic and the messenger RNA-derived sequences. These sequence alterations could be most easily explained by specific post-transcriptional nucleotide modifications. Most of the nucleotide exchanges in coding regions lead to altered codons in the mRNA that specify amino acids better conserved in evolution than those encoded by the genomic DNA. Several instances show that the genomic arginine codon CGG is edited in the mRNA to the tryptophan codon TGG in amino acid positions that are highly conserved as tryptophan in the homologous proteins of other species. This editing suggests that the standard genetic code is used in plant mitochondria and resolves the frequent coincidence of CGG codons and tryptophan in different plant species. The apparently frequent and non-species-specific equivalency of CGG and TGG codons in particular suggests that RNA editing is a common feature of all higher plant mitochondria.
Promoter variants of Xa23 alleles affect bacterial blight resistance and evolutionary pattern

PubMed Central

Xu, Feifei; Tang, Yongchao; Gao, Ying

2017-01-01

Bacterial blight, caused by Xanthomonas oryzae pv. oryzae (Xoo), is the most important bacterial disease in rice (Oryza sativa L.). Our previous studies have revealed that the bacterial blight resistance gene Xa23 from wild rice O. rufipogon Griff. confers the broadest-spectrum resistance against all the naturally occurring Xoo races. As a novel executor R gene, Xa23 is transcriptionally activated by the bacterial avirulence (Avr) protein AvrXa23 via binding to a 28-bp DNA element (EBEAvrXa23) in the promoter region. So far, the evolutionary mechanism of Xa23 remains to be illustrated. Here, a rice germplasm collection of 97 accessions, including 29 rice cultivars (indica and japonica) and 68 wild relatives, was used to analyze the evolution, phylogeographic relationship and association of Xa23 alleles with bacterial blight resistance. All the ~ 473 bp DNA fragments consisting of promoter and coding regions of Xa23 alleles in the germplasm accessions were PCR-amplified and sequenced, and nine single nucleotide polymorphisms (SNPs) were detected in the promoter regions (~131 bp sequence upstream from the start codon ATG) of Xa23/xa23 alleles while only two SNPs were found in the coding regions. The SNPs in the promoter regions formed 5 haplotypes (Pro-A, B, C, D, E) which showed no significant difference in geographic distribution among these 97 rice accessions. However, haplotype association analysis indicated that Pro-A is the most favored haplotype for bacterial blight resistance. Moreover, SNP changes among the 5 haplotypes mostly located in the EBE/ebe regions (EBEAvrXa23 and corresponding ebes located in promoters of xa23 alleles), confirming that the EBE region is the key factor to confer bacterial blight resistance by altering gene expression. Polymorphism analysis and neutral test implied that Xa23 had undergone a bottleneck effect, and selection process of Xa23 was not detected in cultivated rice. In addition, the Xa23 coding region was found highly conserved in the Oryza genus but absent in other plant species by searching the plant database, suggesting that Xa23 originated along with the diversification of the Oryza genus from the grass family during evolution. This research offers a potential for flexible use of novel Xa23 alleles in rice breeding programs and provide a model for evolution analysis of other executor R genes. PMID:28982185
Promoter variants of Xa23 alleles affect bacterial blight resistance and evolutionary pattern.

PubMed

Cui, Hua; Wang, Chunlian; Qin, Tengfei; Xu, Feifei; Tang, Yongchao; Gao, Ying; Zhao, Kaijun

2017-01-01

Bacterial blight, caused by Xanthomonas oryzae pv. oryzae (Xoo), is the most important bacterial disease in rice (Oryza sativa L.). Our previous studies have revealed that the bacterial blight resistance gene Xa23 from wild rice O. rufipogon Griff. confers the broadest-spectrum resistance against all the naturally occurring Xoo races. As a novel executor R gene, Xa23 is transcriptionally activated by the bacterial avirulence (Avr) protein AvrXa23 via binding to a 28-bp DNA element (EBEAvrXa23) in the promoter region. So far, the evolutionary mechanism of Xa23 remains to be illustrated. Here, a rice germplasm collection of 97 accessions, including 29 rice cultivars (indica and japonica) and 68 wild relatives, was used to analyze the evolution, phylogeographic relationship and association of Xa23 alleles with bacterial blight resistance. All the ~ 473 bp DNA fragments consisting of promoter and coding regions of Xa23 alleles in the germplasm accessions were PCR-amplified and sequenced, and nine single nucleotide polymorphisms (SNPs) were detected in the promoter regions (~131 bp sequence upstream from the start codon ATG) of Xa23/xa23 alleles while only two SNPs were found in the coding regions. The SNPs in the promoter regions formed 5 haplotypes (Pro-A, B, C, D, E) which showed no significant difference in geographic distribution among these 97 rice accessions. However, haplotype association analysis indicated that Pro-A is the most favored haplotype for bacterial blight resistance. Moreover, SNP changes among the 5 haplotypes mostly located in the EBE/ebe regions (EBEAvrXa23 and corresponding ebes located in promoters of xa23 alleles), confirming that the EBE region is the key factor to confer bacterial blight resistance by altering gene expression. Polymorphism analysis and neutral test implied that Xa23 had undergone a bottleneck effect, and selection process of Xa23 was not detected in cultivated rice. In addition, the Xa23 coding region was found highly conserved in the Oryza genus but absent in other plant species by searching the plant database, suggesting that Xa23 originated along with the diversification of the Oryza genus from the grass family during evolution. This research offers a potential for flexible use of novel Xa23 alleles in rice breeding programs and provide a model for evolution analysis of other executor R genes.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans.

PubMed

Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M

2013-11-01

Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12,766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans

PubMed Central

Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M

2013-01-01

Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12 766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades. PMID:23838690
The agents of natural genome editing.

PubMed

Witzany, Guenther

2011-06-01

The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.
Systematic screening for mutations in the promoter and the coding region of the 5-HT{sub 1A} gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erdmann, J.; Shimron-Abarbanell, D.; Cichon, S.

1995-10-09

In the present study we sought to identify genetic variation in the 5-HT{sub 1A} receptor gene which through alteration of protein function or level of expression might contribute to the genetic predisposition to neuropsychiatric diseases. Genomic DNA samples from 159 unrelated subjects (including 45 schizophrenic, 46 bipolar affective, and 43 patients with Tourette`s syndrome, as well as 25 healthy controls) were investigated by single-strand conformation analysis. Overlapping PCR (polymerase chain reaction) fragments covered the whole coding sequence as well as the 5{prime} untranslated region of the 5-HT{sub 1A} gene. The region upstream to the coding sequence we investigated contains amore » functional promoter. We found two rare nucleotide sequence variants. Both mutations are located in the coding region of the gene: a coding mutation (A{yields}G) in nucleotide position 82 which leads to an amino acid exchange (Ile{yields}Val) in position 28 of the receptor protein and a silent mutation (C{yields}T) in nucleotide position 549. The occurrence of the Ile-28-Val substitution was studied in an extended sample of patients (n = 352) and controls (n = 210) but was found in similar frequencies in all groups. Thus, this mutation is unlikely to play a significant role in the genetic predisposition to the diseases investigated. In conclusion, our study does not provide evidence that the 5-HT{sub 1A} gene plays either a major or a minor role in the genetic predisposition to schizophrenia, bipolar affective disorder, or Tourette`s syndrome. 29 refs., 4 figs., 1 tab.« less
Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.

PubMed

Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera

2017-01-23

Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental inheritance from the tetraploid progenitor. The obtained molecular, cytogenetic and phylogenetic data demonstrate complex evolutionary dynamics of rDNA loci in allohexaploid species of Atropa belladonna. The high level of sequence unification revealed in 45S and 5S rDNA loci of this ancient hybrid species have been seemingly achieved by different molecular mechanisms.
Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

PubMed Central

Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

1993-01-01

A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829
Molecular cloning of two human liver 3 alpha-hydroxysteroid/dihydrodiol dehydrogenase isoenzymes that are identical with chlordecone reductase and bile-acid binder.

PubMed Central

Deyashiki, Y; Ogasawara, A; Nakayama, T; Nakanishi, M; Miyabe, Y; Sato, K; Hara, A

1994-01-01

Human liver contains two dihydrodiol dehydrogenases, DD2 and DD4, associated with 3 alpha-hydroxysteroid dehydrogenase activity. We have raised polyclonal antibodies that cross-reacted with the two enzymes and isolated two 1.2 kb cDNA clones (C9 and C11) for the two enzymes from a human liver cDNA library using the antibodies. The clones of C9 and C11 contained coding sequences corresponding to 306 and 321 amino acid residues respectively, but lacked 5'-coding regions around the initiation codon. Sequence analyses of several peptides obtained by enzymic and chemical cleavages of the two purified enzymes verified that the C9 and C11 clones encoded DD2 and DD4 respectively, and further indicated that the sequence of DD2 had at least additional 16 residues upward from the N-terminal sequence deduced from the cDNA. There was 82% amino acid sequence identity between the two enzymes, indicating that the enzymes are genetic isoenzymes. A computer-based comparison of the cDNAs of the isoenzymes with the DNA sequence database revealed that the nucleotide and amino acid sequences of DD2 and DD4 are virtually identical with those of human bile-acid binder and human chlordecone reductase cDNAs respectively. Images Figure 1 PMID:8172617
Spliced DNA Sequences in the Paramecium Germline: Their Properties and Evolutionary Potential

PubMed Central

Catania, Francesco; McGrath, Casey L.; Doak, Thomas G.; Lynch, Michael

2013-01-01

Despite playing a crucial role in germline-soma differentiation, the evolutionary significance of developmentally regulated genome rearrangements (DRGRs) has received scant attention. An example of DRGR is DNA splicing, a process that removes segments of DNA interrupting genic and/or intergenic sequences. Perhaps, best known for shaping immune-system genes in vertebrates, DNA splicing plays a central role in the life of ciliated protozoa, where thousands of germline DNA segments are eliminated after sexual reproduction to regenerate a functional somatic genome. Here, we identify and chronicle the properties of 5,286 sequences that putatively undergo DNA splicing (i.e., internal eliminated sequences [IESs]) across the genomes of three closely related species of the ciliate Paramecium (P. tetraurelia, P. biaurelia, and P. sexaurelia). The study reveals that these putative IESs share several physical characteristics. Although our results are consistent with excision events being largely conserved between species, episodes of differential IES retention/excision occur, may have a recent origin, and frequently involve coding regions. Our findings indicate interconversion between somatic—often coding—DNA sequences and noncoding IESs, and provide insights into the role of DNA splicing in creating potentially functional genetic innovation. PMID:23737328
Molecular cloning and expression of collagenase-3, a novel human matrix metalloproteinase produced by breast carcinomas.

PubMed

Freije, J M; Díez-Itza, I; Balbín, M; Sánchez, L M; Blasco, R; Tolivia, J; López-Otín, C

1994-06-17

A cDNA coding for a new human matrix metalloproteinase (MMP) has been cloned from a cDNA library derived from a breast tumor. The isolated cDNA contains an open reading frame coding for a polypeptide of 471 amino acids. The predicted protein sequence displays extensive similarity to the previously known MMPs and presents all the structural features characteristic of the members of this protein family, including the well conserved PRCGXPD motif, involved in the latency of the enzyme and the zinc-binding domain (HEXGHXXXXXHS). In addition, this novel human MMP contains in its amino acid sequence several residues specific to the collagenase subfamily (Tyr-214, Asp-235, and Gly-237) and lacks the 9-residue insertion present in the stromelysins. According to these structural characteristics, the MMP described herein has been tentatively called collagenase-3, since it represents the third member of this subfamily, composed at present of fibroblast and neutrophil collagenases. The collagenase-3 cDNA was expressed in a vaccinia virus system, and the recombinant protein was able to degrade fibrillar collagens, providing support to the hypothesis that the isolated cDNA codes for an authentic collagenase. Northern blot analysis of RNA from normal and pathological tissues demonstrated the existence in breast tumors of three different mRNA species, which seem to be the result of the utilization of different polyadenylation sites present in the 3'-noncoding region of the gene. By contrast, no collagenase-3 mRNA was detected either by Northern blot or RNA polymerase chain reaction analysis with RNA from other human tissues, including normal breast, mammary fibroadenomas, liver, placenta, ovary, uterus, prostate, and parotid gland. On the basis of the increased expression of collagenase-3 in breast carcinomas and the absence of detectable expression in normal tissues, a possible role for this metalloproteinase in the tumoral process is proposed.
DNA-based watermarks using the DNA-Crypt algorithm.

PubMed

Heider, Dominik; Barnekow, Angelika

2007-05-29

The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

DNA-based watermarks using the DNA-Crypt algorithm

PubMed Central

Heider, Dominik; Barnekow, Angelika

2007-01-01

Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434
Orpinomyces cellulase celf protein and coding sequences

DOEpatents

Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.

2000-09-05

A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.
Hundreds of conserved non-coding genomic regions are independently lost in mammals

PubMed Central

Hiller, Michael; Schaar, Bruce T.; Bejerano, Gill

2012-01-01

Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals. PMID:23042682
Detection of mitochondrial DNA mutations in primary breast cancer and fine-needle aspirates.

PubMed

Parrella, P; Xiao, Y; Fliss, M; Sanchez-Cespedes, M; Mazzarelli, P; Rinaldi, M; Nicol, T; Gabrielson, E; Cuomo, C; Cohen, D; Pandit, S; Spencer, M; Rabitti, C; Fazio, V M; Sidransky, D

2001-10-15

To determine the frequency and distribution of mitochondrial DNA mutations in breast cancer, 18 primary breast tumors were analyzed by direct sequencing. Twelve somatic mutations not present in matched lymphocytes and normal breast tissues were detected in 11 of the tumors screened (61%). Of these mutations, five (42%) were deletions or insertions in a homopolymeric C-stretch between nucleotides 303-315 (D310) within the D-loop. The remaining seven mutations (58%) were single-base substitutions in the coding (ND1, ND4, ND5, and cytochrome b genes) or noncoding regions (D-loop) of the mitochondrial genome. In three cases (25%), the mutations detected in coding regions led to amino acid substitutions in the protein sequence. We then screened an additional 46 primary breast tumors with a rapid PCR-based assay to identify poly-C alterations in D310, and we found seven more cancers with alterations. Using D310 mutations as clonal marker, we detected identical changes in five of five matched fine-needle aspirates and in four of four metastases-positive lymph nodes. The high frequency of D310 alterations in primary breast cancer combined with the high sensitivity of the PCR-based assays provides a new molecular tool for cancer detection.
Linkage and homology analysis divides the eight genes for the small subunit of petunia ribulose 1,5-bisphosphate carboxylase into three gene families

PubMed Central

Dean, Caroline; van den Elzen, Peter; Tamaki, Stanley; Dunsmuir, Pamela; Bedbrook, John

1985-01-01

Twenty-six λ phage clones with homology to coding sequences of the small subunit (SSU) of ribulose 1,5-bisphosphate carboxylase have been isolated from an EMBL3 λ phage bank of Petunia (Mitchell) DNA. Restriction mapping of the phage inserts shows that the clones were obtained from five nonoverlapping regions of petunia DNA that carry seven SSU genes. Comparison of the HindIII genomic fragments of petunia DNA with the HindIII restriction fragments of the isolated phage indicates that petunia nuclear DNA encodes eight SSU genes, seven of which are present in the phage clones. Two incomplete genes, which contain only the 3′ end of an SSU gene, were also found in the phage clones. We demonstrate that the eight SSU genes of petunia can be divided into three gene families based on homology to three petunia cDNA clones. Two gene families contain single SSU genes and the third contains six genes, four of which are closely linked within petunia nuclear DNA. Images PMID:16593584
Mitochondrial genomes of the green macroalga Ulva pertusa (Ulvophyceae, Chlorophyta): novel insights into the evolution of mitogenomes in the Ulvophyceae.

PubMed

Liu, Feng; Melton, James T; Bi, Yuping

2017-10-01

To further understand the trends in the evolution of mitochondrial genomes (mitogenomes or mtDNAs) in the Ulvophyceae, the mitogenomes of two separate thalli of Ulva pertusa were sequenced. Two U. pertusa mitogenomes (Up1 and Up2) were 69,333 bp and 64,602 bp in length. These mitogenomes shared two ribosomal RNAs (rRNAs), 28 transfer RNAs (tRNAs), 29 protein-coding genes, and 12 open reading frames. The 4.7 kb difference in size was attributed to variation in intron content and tandem repeat regions. A total of six introns were present in the smaller U. pertusa mtDNA (Up2), while the larger mtDNA (Up1) had eight. The larger mtDNA had two additional group II introns in two genes (cox1 and cox2) and tandem duplication mutations in noncoding regions. Our results showed the first case of intraspecific variation in chlorophytan mitogenomes and provided further genomic data for the undersampled Ulvophyceae. © 2017 Phycological Society of America.
Characterisation of ATM mutations in Slavic Ataxia telangiectasia patients.

PubMed

Soukupova, Jana; Pohlreich, Petr; Seemanova, Eva

2011-09-01

Ataxia telangiectasia (AT) is a genomic instability syndrome characterised, among others, by progressive cerebellar degeneration, oculocutaneous telangiectases, immunodeficiency, elevated serum alpha-phetoprotein level, chromosomal breakage, hypersensitivity to ionising radiation and increased cancer risk. This autosomal recessive disorder is caused by mutations in the ataxia telangiectasia mutated (ATM) gene coding for serine/threonine protein kinase with a crucial role in response to DNA double-strand breaks. We characterised genotype and phenotype of 12 Slavic AT patients from 11 families. Mutation analysis included sequencing of the entire coding sequence, adjacent intron regions, 3'UTR and 5'UTR of the ATM gene and multiplex ligation-dependent probe amplification (MLPA) for the detection of large deletions/duplications at the ATM locus. The high incidence of new and individual mutations demonstrates a marked mutational heterogeneity of AT in the Czech Republic. Our data indicate that sequence analysis of the entire coding region of ATM is sufficient for a high detection rate of mutations in ATM and that MLPA analysis for the detection of deletions/duplications seems to be redundant in the Slavic population.
An Integrated Encyclopedia of DNA Elements in the Human Genome

PubMed Central

2012-01-01

Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research. PMID:22955616
The reversed terminator of octopine synthase gene on the Agrobacterium Ti plasmid has a weak promoter activity in prokaryotes.

PubMed

Shao, Jun-Li; Long, Yue-Sheng; Chen, Gu; Xie, Jun; Xu, Zeng-Fu

2010-06-01

Agrobacterium tumefaciens transfers DNA from its Ti plasmid to plant host cells. The genes located within the transferred DNA of Ti plasmid including the octopine synthase gene (OCS) are expressed in plant host cells. The 3'-flanking region of OCS gene, known as OCS terminator, is widely used as a transcriptional terminator of the transgenes in plant expression vectors. In this study, we found the reversed OCS terminator (3'-OCS-r) could drive expression of hygromycin phosphotransferase II gene (hpt II) and beta-glucuronidase gene in Escherichia coli, and expression of hpt II in A. tumefaciens. Furthermore, reverse transcription-polymerase chain reaction analysis revealed that an open reading frame (ORF12) that is located downstream to the 3'-OCS-r was transcribed in A. tumefaciens, which overlaps in reverse with the coding region of the OCS gene in octopine Ti plasmid.
Chromatin structure and methylation of rat rRNA genes studied by formaldehyde fixation and psoralen cross-linking.

PubMed Central

Stancheva, I; Lucchini, R; Koller, T; Sogo, J M

1997-01-01

By using formaldehyde cross-linking of histones to DNA and gel retardation assays we show that formaldehyde fixation, similar to previously established psoralen photocross-linking, discriminates between nucleosome- packed (inactive) and nucleosome-free (active) fractions of ribosomal RNA genes. By both cross-linking techniques we were able to purify fragments from agarose gels, corresponding to coding, enhancer and promoter sequences of rRNA genes, which were further investigated with respect to DNA methylation. This approach allows us to analyse independently and in detail methylation patterns of active and inactive rRNA gene copies by the combination of Hpa II and Msp I restriction enzymes. We found CpG methylation mainly present in enhancer and promoter regions of inactive rRNA gene copies. The methylation of one single Hpa II site, located in the promoter region, showed particularly strong correlation with the transcriptional activity. PMID:9108154
Ribosomal protein S14 transcripts are edited in Oenothera mitochondria.

PubMed Central

Schuster, W; Unseld, M; Wissinger, B; Brennicke, A

1990-01-01

The gene encoding ribosomal protein S14 (rps14) in Oenothera mitochondria is located upstream of the cytochrome b gene (cob). Sequence analysis of independently derived cDNA clones covering the entire rps14 coding region shows two nucleotides edited from the genomic DNA to the mRNA derived sequences by C to U modifications. A third editing event occurs four nucleotides upstream of the AUG initiation codon and improves a potential ribosome binding site. A CGG codon specifying arginine in a position conserved in evolution between chloroplasts and E. coli as a UGG tryptophan codon is not edited in any of the cDNAs analysed. An inverted repeat 3' of an unidentified open reading frame is located upstream of the rps14 gene. The inverted repeat sequence is highly conserved at analogous regions in other Oenothera mitochondrial loci. Images PMID:2326162
Assessing information content and interactive relationships of subgenomic DNA sequences of the MHC using complexity theory approaches based on the non-extensive statistical mechanics

NASA Astrophysics Data System (ADS)

Karakatsanis, L. P.; Pavlos, G. P.; Iliopoulos, A. C.; Pavlos, E. G.; Clark, P. M.; Duke, J. L.; Monos, D. S.

2018-09-01

This study combines two independent domains of science, the high throughput DNA sequencing capabilities of Genomics and complexity theory from Physics, to assess the information encoded by the different genomic segments of exonic, intronic and intergenic regions of the Major Histocompatibility Complex (MHC) and identify possible interactive relationships. The dynamic and non-extensive statistical characteristics of two well characterized MHC sequences from the homozygous cell lines, PGF and COX, in addition to two other genomic regions of comparable size, used as controls, have been studied using the reconstructed phase space theorem and the non-extensive statistical theory of Tsallis. The results reveal similar non-linear dynamical behavior as far as complexity and self-organization features. In particular, the low-dimensional deterministic nonlinear chaotic and non-extensive statistical character of the DNA sequences was verified with strong multifractal characteristics and long-range correlations. The nonlinear indices repeatedly verified that MHC sequences, whether exonic, intronic or intergenic include varying levels of information and reveal an interaction of the genes with intergenic regions, whereby the lower the number of genes in a region, the less the complexity and information content of the intergenic region. Finally we showed the significance of the intergenic region in the production of the DNA dynamics. The findings reveal interesting content information in all three genomic elements and interactive relationships of the genes with the intergenic regions. The results most likely are relevant to the whole genome and not only to the MHC. These findings are consistent with the ENCODE project, which has now established that the non-coding regions of the genome remain to be of relevance, as they are functionally important and play a significant role in the regulation of expression of genes and coordination of the many biological processes of the cell.
H3.3 demarcates GC-rich coding and subtelomeric regions and serves as potential memory mark for virulence gene expression in Plasmodium falciparum

PubMed Central

Fraschka, Sabine Anne-Kristin; Henderson, Rob Wilhelmus Maria; Bártfai, Richárd

2016-01-01

Histones, by packaging and organizing the DNA into chromatin, serve as essential building blocks for eukaryotic life. The basic structure of the chromatin is established by four canonical histones (H2A, H2B, H3 and H4), while histone variants are more commonly utilized to alter the properties of specific chromatin domains. H3.3, a variant of histone H3, was found to have diverse localization patterns and functions across species but has been rather poorly studied in protists. Here we present the first genome-wide analysis of H3.3 in the malaria-causing, apicomplexan parasite, P. falciparum, which revealed a complex occupancy profile consisting of conserved and parasite-specific features. In contrast to other histone variants, PfH3.3 primarily demarcates euchromatic coding and subtelomeric repetitive sequences. Stable occupancy of PfH3.3 in these regions is largely uncoupled from the transcriptional activity and appears to be primarily dependent on the GC-content of the underlying DNA. Importantly, PfH3.3 specifically marks the promoter region of an active and poised, but not inactive antigenic variation (var) gene, thereby potentially contributing to immune evasion. Collectively, our data suggest that PfH3.3, together with other histone variants, indexes the P. falciparum genome to functionally distinct domains and contribute to a key survival strategy of this deadly pathogen. PMID:27555062
RNA Helicase Associated with AU-rich Element (RHAU/DHX36) Interacts with the 3′-Tail of the Long Non-coding RNA BC200 (BCYRN1)*

PubMed Central

Booy, Evan P.; McRae, Ewan K. S.; Howard, Ryan; Deo, Soumya R.; Ariyo, Emmanuel O.; Dzananovic, Edis; Meier, Markus; Stetefeld, Jörg; McKenna, Sean A.

2016-01-01

RNA helicase associated with AU-rich element (RHAU) is an ATP-dependent RNA helicase that demonstrates high affinity for quadruplex structures in DNA and RNA. To elucidate the significance of these quadruplex-RHAU interactions, we have performed RNA co-immunoprecipitation screens to identify novel RNAs bound to RHAU and characterize their function. In the course of this study, we have identified the non-coding RNA BC200 (BCYRN1) as specifically enriched upon RHAU immunoprecipitation. Although BC200 does not adopt a quadruplex structure and does not bind the quadruplex-interacting motif of RHAU, it has direct affinity for RHAU in vitro. Specifically designed BC200 truncations and RNase footprinting assays demonstrate that RHAU binds to an adenosine-rich region near the 3′-end of the RNA. RHAU truncations support binding that is dependent upon a region within the C terminus and is specific to RHAU isoform 1. Tests performed to assess whether BC200 interferes with RHAU helicase activity have demonstrated the ability of BC200 to act as an acceptor of unwound quadruplexes via a cytosine-rich region near the 3′-end of the RNA. Furthermore, an interaction between BC200 and the quadruplex-containing telomerase RNA was confirmed by pull-down assays of the endogenous RNAs. This leads to the possibility that RHAU may direct BC200 to bind and exert regulatory functions at quadruplex-containing RNA or DNA sequences. PMID:26740632
Paternal leakage of mitochondrial DNA in experimental crosses of populations of the potato cyst nematode Globodera pallida.

PubMed

Hoolahan, Angelique H; Blok, Vivian C; Gibson, Tracey; Dowton, Mark

2011-12-01

Animal mtDNA is typically assumed to be maternally inherited. Paternal mtDNA has been shown to be excluded from entering the egg or eliminated post-fertilization in several animals. However, in the contact zones of hybridizing species and populations, the reproductive barriers between hybridizing organisms may not be as efficient at preventing paternal mtDNA inheritance, resulting in paternal leakage. We assessed paternal mtDNA leakage in experimental crosses of populations of a cyst-forming nematode, Globodera pallida. A UK population, Lindley, was crossed with two South American populations, P5A and P4A. Hybridization of these populations was supported by evidence of nuclear DNA from both the maternal and paternal populations in the progeny. To assess paternal mtDNA leakage, a ~3.4 kb non-coding mtDNA region was analyzed in the parental populations and in the progeny. Paternal mtDNA was evident in the progeny of both crosses involving populations P5A and P4A. Further, paternal mtDNA replaced the maternal mtDNA in 22 and 40 % of the hybrid cysts from these crosses, respectively. These results indicate that under appropriate conditions, paternal leakage occurs in the mtDNA of parasitic nematodes, and supports the hypothesis that hybrid zones facilitate paternal leakage. Thus, assumptions of strictly maternal mtDNA inheritance may be frequently violated, particularly when divergent populations interbreed.
Reduced TCOF1 mRNA level in a rhesus macaque with Treacher Collins-like syndrome: further evidence for haploinsufficiency of treacle as the cause of disease.

PubMed

Shows, Kathryn H; Ward, Christy; Summers, Laura; Li, Lin; Ziegler, Gregory R; Hendrickx, Andrew G; Shiang, Rita

2006-02-01

Mutations in the human gene TCOF1 cause a mandibulofacial dysostosis known as Treacher Collins syndrome (TCS). An infant rhesus macaque (Macaca mulatta) that displayed the TCS phenotype was identified at the California National Primate Research Center. The TCOF1 coding region was cloned from a normal rhesus macaque and sequenced. The rhesus macaque homolog of TCOF1 is 91.6% identical in cDNA sequence and 93.8% identical in translated protein sequence compared to human TCOF1. Sequencing of TCOF1 in the TCS-affected rhesus macaque showed no mutations within the coding region or splice sites; however, real-time quantitative PCR showed an 87% reduction of spleen TCOF1 mRNA level in the TCS affected macaque when compared with normal macaque spleen.
Gene encoding the human. beta. -hexosaminidase. beta. chain: Extensive homology of intron placement in the. alpha. - and. beta. -chain genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Proia, R.L.

1988-03-01

Lysosomal {beta}-hexosaminidase is composed of two structurally similar chains, {alpha} and {beta}, that are the products of different genes. Mutations in either gene causing {beta}-hexosaminidase deficiency result in the lysosomal storage disease GM2-gangliosidosis. To enable the investigation of the molecular lesions in this disorder and to study the evolutionary relationship between the {alpha} and {beta} chains, the {beta}-chain gene was isolated, and its organization was characterized. The {beta}-chain coding region is divided into 14 exons distributed over {approx}40 kilobases of DNA. Comparison with the {alpha}-chain gene revealed that 12 of the 13 introns interrupt the coding regions at homologous positions.more » This extensive sharing of intron placement demonstrates that the {alpha} and {beta} chains evolved by way of the duplication of a common ancestor.« less
Complete mitogenome sequencing and phylogenetic analysis of PaLi yak (Bos grunniens).

PubMed

Bao, Pengjia; Guo, Xian; Pei, Jie; Liang, Chunnian; Ding, Xuezhi; Min, Chu; Wang, Hongbo; Wu, Xiaoyun; Yan, Ping

2016-11-01

PaLi yak is a very important local breed in China; as a year-round grazing animal, it plays a very important role for the economic and native herdsmen. The PaLi yak complete mitochondrial DNA is sequenced in this study, the total length is 16,324 bp, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a non-coding control region (D-loop region). The order and composition are similar to most of the other vertebrates. The base contents are: 33.72% A, 25.80% C, 13.21% G and 27.27% T; A + T (60.99%) was higher than G + C (39.01%). The phylogenetic relationships were analyzed using the complete mitogenome sequence, results showed that the genetic relationship between yak and cattle is distinct. These information provides useful data for further study on protection of genetic resources and the taxonomy of Bovinae.
Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans

PubMed Central

Metspalu, Mait; Kivisild, Toomas; Metspalu, Ene; Parik, Jüri; Hudjashov, Georgi; Kaldma, Katrin; Serk, Piia; Karmin, Monika; Behar, Doron M; Gilbert, M Thomas P; Endicott, Phillip; Mastana, Sarabjit; Papiha, Surinder S; Skorecki, Karl; Torroni, Antonio; Villems, Richard

2004-01-01

Background Recent advances in the understanding of the maternal and paternal heritage of south and southwest Asian populations have highlighted their role in the colonization of Eurasia by anatomically modern humans. Further understanding requires a deeper insight into the topology of the branches of the Indian mtDNA phylogenetic tree, which should be contextualized within the phylogeography of the neighboring regional mtDNA variation. Accordingly, we have analyzed mtDNA control and coding region variation in 796 Indian (including both tribal and caste populations from different parts of India) and 436 Iranian mtDNAs. The results were integrated and analyzed together with published data from South, Southeast Asia and West Eurasia. Results Four new Indian-specific haplogroup M sub-clades were defined. These, in combination with two previously described haplogroups, encompass approximately one third of the haplogroup M mtDNAs in India. Their phylogeography and spread among different linguistic phyla and social strata was investigated in detail. Furthermore, the analysis of the Iranian mtDNA pool revealed patterns of limited reciprocal gene flow between Iran and the Indian sub-continent and allowed the identification of different assemblies of shared mtDNA sub-clades. Conclusions Since the initial peopling of South and West Asia by anatomically modern humans, when this region may well have provided the initial settlers who colonized much of the rest of Eurasia, the gene flow in and out of India of the maternally transmitted mtDNA has been surprisingly limited. Specifically, our analysis of the mtDNA haplogroups, which are shared between Indian and Iranian populations and exhibit coalescence ages corresponding to around the early Upper Paleolithic, indicates that they are present in India largely as Indian-specific sub-lineages. In contrast, other ancient Indian-specific variants of M and R are very rare outside the sub-continent. PMID:15339343
Analysis of Chromatin Regulators Reveals Specific Features of Rice DNA Methylation Pathways.

PubMed

Tan, Feng; Zhou, Chao; Zhou, Qiangwei; Zhou, Shaoli; Yang, Wenjing; Zhao, Yu; Li, Guoliang; Zhou, Dao-Xiu

2016-07-01

Plant DNA methylation that occurs at CG, CHG, and CHH sites (H = A, C, or T) is a hallmark of the repression of repetitive sequences and transposable elements (TEs). The rice (Oryza sativa) genome contains about 40% repetitive sequence and TEs and displays specific patterns of genome-wide DNA methylation. The mechanism responsible for the specific methylation patterns is unclear. Here, we analyzed the function of OsDDM1 (Deficient in DNA Methylation 1) and OsDRM2 (Deficient in DNA Methylation 1) in genome-wide DNA methylation, TE repression, small RNA accumulation, and gene expression. We show that OsDDM1 is essential for high levels of methylation at CHG and, to a lesser extent, CG sites in heterochromatic regions and also is required for CHH methylation that mainly locates in the genic regions of the genome. In addition to a large member of TEs, loss of OsDDM1 leads to hypomethylation and up-regulation of many protein-coding genes, producing very severe growth phenotypes at the initial generation. Importantly, we show that OsDRM2 mutation results in a nearly complete loss of CHH methylation and derepression of mainly small TE-associated genes and that OsDDM1 is involved in facilitating OsDRM2-mediated CHH methylation. Thus, the function of OsDDM1 and OsDRM2 defines distinct DNA methylation pathways in the bulk of DNA methylation of the genome, which is possibly related to the dispersed heterochromatin across chromosomes in rice and suggests that DNA methylation mechanisms may vary among different plant species. © 2016 American Society of Plant Biologists. All Rights Reserved.

Phylogenetic relationship of the genus Gilbertella and related genera within the order Mucorales based on 5.8 S ribosomal DNA sequences.

PubMed

Papp, T; Acs, Klára; Nyilasi, Ildikó; Nagy, Erzsébet; Vágvölgyi, Cs

2003-01-01

The complete ITS (internal transcribed spacer) region coding the ITS1, the ITS2 and the 5.8S rDNA was amplified by polymerase chain reaction from two strains of Gilbertella persicaria, six strains in the Mucoraceae (Mucor piriformis, M. rouxii, M. circinelloides, Rhizomucor miehei, R. pusillus and R. tauricus) and four strains representing three species of the Choanephoraceae (Blakeslea trispora, Choanephora infundibulifera and Poitrasia circinans). Sequences of the amplified DNA fragments were determined and analysed. G. persicaria belongs to the monogeneric family (Gilbertellaceae), however, originally it was described as Choanephora persicaria. The goal of this study was to reveal the phylogenetic relationship among fungi belonging to Gilbertellaceae, Choanephoraceae and Mucoraceae. Our results support that the "intermediate" position of this family is between Choanephoraceae and Mucoraceae.
Complete nucleotide sequence of the gene for human heparin cofactor II and mapping to chromosomal band 22q11

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herzog, R.; Lutz, S.; Blin, N.

1991-02-05

Heparin cofactor II (HCII) is a 66-kDa plasma glycoprotein that inhibits thrombin rapidly in the presence of dermatan sulfate or heparin. Clones comprising the entire HCII gene were isolated from a human leukocyte genomic library in EMBL-3 {lambda} phage. The sequence of the gene was determined on both strands of DNA (15,849 bp) and included 1,749 bp of 5{prime}-flanking sequence, five exons, four introns, and 476 bp of DNA 3{prime} to the polyadenylation site. Ten complete and one partial Alu repeats were identified in the introns and 5{prime}-flanking region. The HCII gene was regionally mapped on chromosome 22 using rodent-humanmore » somatic cell hybrids, carrying only parts of human chromosome 22, and the chronic myelogenous leukemia cell line K562. With the cDNA probe HCII7.2, containing the entire coding region of the gene, the HCII gene was shown to be amplified 10-20-fold in K562 cells by Southern analysis and in situ hybridization. From these data, the authors concluded that the HCII gene is localized on the chromosomal band 22q11 proximal to the breakpoint cluster region (BCR). Analysis by pulsed-field gel electrophoresis indicated that the amplified HCII gene in K562 cells maps at least 2 Mbp proximal to BCR-1. Furthermore, the HCII7.2 cDNA probe detected two frequent restriction fragment length polymorphisms with the restriction enzymes BamHI and Hind III.« less
Surface Diversity in Mycoplasma agalactiae Is Driven by Site-Specific DNA Inversions within the vpma Multigene Locus

PubMed Central

Glew, Michelle D.; Marenda, Marc; Rosengarten, Renate; Citti, Christine

2002-01-01

The ruminant pathogen Mycoplasma agalactiae possesses a family of abundantly expressed variable surface lipoproteins called Vpmas. Phenotypic switches between Vpma members have previously been correlated with DNA rearrangements within a locus of vpma genes and are proposed to play an important role in disease pathogenesis. In this study, six vpma genes were characterized in the M. agalactiae type strain PG2. All vpma genes clustered within an 8-kb region and shared highly conserved 5′ untranslated regions, lipoprotein signal sequences, and short N-terminal sequences. Analyses of the vpma loci from consecutive clonal isolates showed that vpma DNA rearrangements were site specific and that cleavage and strand exchange occurred within a minimal region of 21 bp located within the 5′ untranslated region of all vpma genes. This process controlled expression of vpma genes by effectively linking the open reading frame (ORF) of a silent gene to a unique active promoter sequence within the locus. An ORF (xer1) immediately adjacent to one end of the vpma locus did not undergo rearrangement and had significant homology to a distinct subset of genes belonging to the λ integrase family of site-specific xer recombinases. It is proposed that xer1 codes for a site-specific recombinase that is not involved in chromosome dimer resolution but rather is responsible for the observed vpma-specific recombination in M. agalactiae. PMID:12374833
Genomewide analysis indicates that queen larvae have lower methylation levels in the honey bee ( Apis mellifera)

NASA Astrophysics Data System (ADS)

Shi, Yuan Yuan; Yan, Wei Yu; Huang, Zachary Y.; Wang, Zi Long; Wu, Xiao Bo; Zeng, Zhi Jiang

2013-02-01

The honey bee is a social insect characterized by caste differentiation, by which a young larva can develop into either a queen or a worker. Despite possessing the same genome, queen and workers display marked differences in reproductive capacity, physiology, and behavior. Recent studies have shown that DNA methylation plays important roles in caste differentiation. To further explore the roles of DNA methylation in this process, we analyzed DNA methylome profiles of both queen larvae (QL) and worker larvae (WL) of different ages (2, 4, and 6 day old), by using methylated DNA immunoprecipitation-sequencing (meDIP-seq) technique. The global DNA methylation levels varied between the larvae of two castes. DNA methylation increased from 2-day- to 4-day-old QL and then decreased in 6-day-old larvae. In WL, methylation levels increased with age. The methylcytosines in both larvae were enriched in introns, followed by coding sequence (CDS) regions, CpG islands, 2 kbp downstream and upstream of genes, and 5' and 3' untranslated regions (UTRs). The number of differentially methylated genes (DMGs) in 2-, 4-, and 6-day-old QL and WL was 725, 3,013, and 5,049, respectively. Compared to 4- and 6-day-old WL, a large number of genes in QL were downmethylated, which were involved in many processes including development, reproduction, and metabolic regulation. In addition, some DMGs were concerned with caste differentiation.
Genetic mosaic in a marine species flock.

PubMed

McCartney, Michael A; Acevedo, Jenny; Heredia, Christine; Rico, Ciro; Quenoville, Brice; Bermingham, Eldredge; McMillan, W Owen

2003-11-01

We used molecular approaches to study the status of speciation in coral reef fishes known as hamlets (Serranidae: Hypoplectrus). Several hamlet morphospecies coexist on Caribbean reefs, and mate assortatively with respect to their strikingly distinct colour patterns. We provide evidence that, genetically, the hamlets display characteristics common in species flocks on land and in freshwaters. Substitutions within two mitochondrial DNA (mtDNA) protein-coding genes place hamlets within a monophyletic group relative to members of two related genera (Serranus and Diplectrum), and establish that the hamlet radiation must have been very recent. mtDNA distances separating hamlet morphospecies were slight (0.6 +/- 0.04%), yielding a coalescent estimate for the age of the hamlet flock of approximately 430 000 years. Morphospecies did not sort into distinct mtDNA haplotype phylogroups, and alleles at five hypervariable microsatellite loci were shared broadly across species boundaries. None the less, molecular variation was not distributed at random. Analyses of mtDNA haplotype frequencies and nested clades in haplotype networks revealed significant genetic differences between geographical regions and among colour morphospecies. We also observed significant microsatellite differentiation between geographical regions and in Puerto Rico, among colour morphospecies; the latter providing evidence for reproductive isolation between colour morphospecies at this locale. In our Panama collection, however, colour morphospecies were mostly genetically indistinguishable. This mosaic pattern of DNA differentiation implies a complex interaction between population history, mating behaviour and geography and suggests that porous boundaries separate species in this flock of brilliantly coloured coral reef fishes.
Phylogeographic Analysis of Mitochondrial DNA in Northern Asian Populations

PubMed Central

Derenko, Miroslava ; Malyarchuk, Boris ; Grzybowski, Tomasz ; Denisova, Galina ; Dambueva, Irina ; Perkova, Maria ; Dorzhu, Choduraa ; Luzina, Faina ; Lee, Hong Kyu ; Vanecek, Tomas ; Villems, Richard ; Zakharov, Ilia

2007-01-01

To elucidate the human colonization process of northern Asia and human dispersals to the Americas, a diverse subset of 71 mitochondrial DNA (mtDNA) lineages was chosen for complete genome sequencing from the collection of 1,432 control-region sequences sampled from 18 autochthonous populations of northern, central, eastern, and southwestern Asia. On the basis of complete mtDNA sequencing, we have revised the classification of haplogroups A, D2, G1, M7, and I; identified six new subhaplogroups (I4, N1e, G1c, M7d, M7e, and J1b2a); and fully characterized haplogroups N1a and G1b, which were previously described only by the first hypervariable segment (HVS1) sequencing and coding-region restriction-fragment–length polymorphism analysis. Our findings indicate that the southern Siberian mtDNA pool harbors several lineages associated with the Late Upper Paleolithic and/or early Neolithic dispersals from both eastern Asia and southwestern Asia/southern Caucasus. Moreover, the phylogeography of the D2 lineages suggests that southern Siberia is likely to be a geographical source for the last postglacial maximum spread of this subhaplogroup to northern Siberia and that the expansion of the D2b branch occurred in Beringia ∼7,000 years ago. In general, a detailed analysis of mtDNA gene pools of northern Asians provides the additional evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia. PMID:17924343
Phylogeographic analysis of mitochondrial DNA in northern Asian populations.

PubMed

Derenko, Miroslava; Malyarchuk, Boris; Grzybowski, Tomasz; Denisova, Galina; Dambueva, Irina; Perkova, Maria; Dorzhu, Choduraa; Luzina, Faina; Lee, Hong Kyu; Vanecek, Tomas; Villems, Richard; Zakharov, Ilia

2007-11-01

To elucidate the human colonization process of northern Asia and human dispersals to the Americas, a diverse subset of 71 mitochondrial DNA (mtDNA) lineages was chosen for complete genome sequencing from the collection of 1,432 control-region sequences sampled from 18 autochthonous populations of northern, central, eastern, and southwestern Asia. On the basis of complete mtDNA sequencing, we have revised the classification of haplogroups A, D2, G1, M7, and I; identified six new subhaplogroups (I4, N1e, G1c, M7d, M7e, and J1b2a); and fully characterized haplogroups N1a and G1b, which were previously described only by the first hypervariable segment (HVS1) sequencing and coding-region restriction-fragment-length polymorphism analysis. Our findings indicate that the southern Siberian mtDNA pool harbors several lineages associated with the Late Upper Paleolithic and/or early Neolithic dispersals from both eastern Asia and southwestern Asia/southern Caucasus. Moreover, the phylogeography of the D2 lineages suggests that southern Siberia is likely to be a geographical source for the last postglacial maximum spread of this subhaplogroup to northern Siberia and that the expansion of the D2b branch occurred in Beringia ~7,000 years ago. In general, a detailed analysis of mtDNA gene pools of northern Asians provides the additional evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia.
Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

NASA Astrophysics Data System (ADS)

Shi, Jinming

In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.
Cloning, sequencing, and expression of dnaK-operon proteins from the thermophilic bacterium Thermus thermophilus.

PubMed

Osipiuk, J; Joachimiak, A

1997-09-12

We propose that the dnaK operon of Thermus thermophilus HB8 is composed of three functionally linked genes: dnaK, grpE, and dnaJ. The dnaK and dnaJ gene products are most closely related to their cyanobacterial homologs. The DnaK protein sequence places T. thermophilus in the plastid Hsp70 subfamily. In contrast, the grpE translated sequence is most similar to GrpE from Clostridium acetobutylicum, a Gram-positive anaerobic bacterium. A single promoter region, with homology to the Escherichia coli consensus promoter sequences recognized by the sigma70 and sigma32 transcription factors, precedes the postulated operon. This promoter is heat-shock inducible. The dnaK mRNA level increased more than 30 times upon 10 min of heat shock (from 70 degrees C to 85 degrees C). A strong transcription terminating sequence was found between the dnaK and grpE genes. The individual genes were cloned into pET expression vectors and the thermophilic proteins were overproduced at high levels in E. coli and purified to homogeneity. The recombinant T. thermophilus DnaK protein was shown to have a weak ATP-hydrolytic activity, with an optimum at 90 degrees C. The ATPase was stimulated by the presence of GrpE and DnaJ. Another open reading frame, coding for ClpB heat-shock protein, was found downstream of the dnaK operon.
Cloning and Expression of cDNA for Rat Heme Oxygenase

NASA Astrophysics Data System (ADS)

Shibahara, Shigeki; Muller, Rita; Taguchi, Hayao; Yoshida, Tadashi

1985-12-01

Two cDNA clones for rat heme oxygenase have been isolated from a rat spleen cDNA library in λ gt11 by immunological screening using a specific polyclonal antibody. One of these clones has an insert of 1530 nucleotides that contains the entire protein-coding region. To confirm that the isolated cDNA encodes heme oxygenase, we transfected monkey kidney cells (COS-7) with the cDNA carried in a simian virus 40 vector. The heme oxygenase was highly expressed in endoplasmic reticulum of transfected cells. The nucleotide sequence of the cloned cDNA was determined and the primary structure of heme oxygenase was deduced. Heme oxygenase is composed of 289 amino acids and has one hydrophobic segment at its carboxyl terminus, which is probably important for the insertion of heme oxygenase into endoplasmic reticulum. The cloned cDNA was used to analyze the induction of heme oxygenase in rat liver by treatment with CoCl2 or with hemin. RNA blot analysis showed that both CoCl2 and hemin increased the amount of hybridizable mRNA, suggesting that these substances may act at the transcriptional level to increase the amount of heme oxygenase.
Genomics dataset on unclassified published organism (patent US 7547531).

PubMed

Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier

2016-12-01

Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

PubMed Central

Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

2008-01-01

Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues. PMID:18973670
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples.

PubMed

Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

2008-10-30

Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12-17 bp), C. elegans (11-17 bp), A. thaliana (11-17 bp), S. cerevisiae (10-16 bp) and E. coli (9-15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.
A new open reading frame in the genome of the cyanobacterium Synechocystis sp. PCC 6803

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lysenko, E.S.; Ogarkova, O.A.; Tarasov, V.A.

1995-02-01

A new open reading frame ORF242, coding for a 26.47-kDa polypeptide, was found in a DNA fragment of the cyanobacterium Synechocystis 6803, transforming a photosynthetic mutant to photoautotrophy and having homology with plant chloroplast DNA. In the 5{prime} flanking region of ORF242, consensus sequences characteristic of a functioning gene were found. One copy of ORF242 is present in the Synechocystis 6803 genome. Insertion inactivation of ORF242 does not lead to a decrease in photosynthetic activity in cells of cyanobacteria but may influence the ratio between active complexes of photosystems I and II. 22 refs., 6 figs., 2 tabs.
The ura5 gene of the ascomycete Sordaria macrospora: molecular cloning, characterization and expression in Escherichia coli.

PubMed

Le Chevanton, L; Leblon, G

1989-04-15

We cloned the ura5 gene coding for the orotate phosphoribosyl transferase from the ascomycete Sordaria macrospora by heterologous probing of a Sordaria genomic DNA library with the corresponding Podospora anserina sequence. The Sordaria gene was expressed in an Escherichia coli pyrE mutant strain defective for the same enzyme, and expression was shown to be promoted by plasmid sequences. The nucleotide sequence of the 1246-bp DNA fragment encompassing the region of homology with the Podospora gene has been determined. This sequence contains an open reading frame of 699 nucleotides. The deduced amino acid sequence shows 72% similarity with the corresponding Podospora protein.
Cloning and characterization of the gene encoding the endopolygalacturonase-inhibiting protein (PGIP) of Phaseolus vulgaris L.

PubMed

Toubart, P; Desiderio, A; Salvi, G; Cervone, F; Daroda, L; De Lorenzo, G

1992-05-01

Polygalacturonase-inhibiting protein (PGIP) is a cell wall protein purified from hypocotyls of true bean (Phaseolus vulgaris L.). PGIP inhibits fungal endopolygalacturonases and is considered to be an important factor for plant resistance to phytopathogenic fungi (Albersheim and Anderson, 1971; Cervone et al., 1987). The amino acid sequences of the N-terminus and one internal tryptic peptide of the PGIP purified from P. vulgaris cv. Pinto were used to design redundant oligonucleotides that were successfully utilized as primers in a polymerase chain reaction (PCR) with total DNA of P. vulgaris as a template. A DNA band of 758 bp (a specific PCR amplification product of part of the gene coding for PGIP) was isolated and cloned. By using the 758-bp DNA as a hybridization probe, a lambda clone containing the PGIP gene was isolated from a genomic library of P. vulgaris cv. Saxa. The coding and immediate flanking regions of the PGIP gene, contained on a subcloned 3.3 kb SalI-SalI DNA fragment, were sequenced. A single, continuous ORF of 1026 nt (342 amino acids) was present in the genomic clone. The nucleotide and deduced amino acid sequences of the PGIP gene showed no significant similarity with any known databank sequence. Northern blotting analysis of poly(A)+ RNAs, isolated from various tissues of bean seedlings or from suspension-cultured bean cells, were also performed using the cloned PCR-generated DNA as a probe. A 1.2 kb transcript was detected in suspension-cultured cells and, to a lesser extent, in leaves, hypocotyls, and flowers.(ABSTRACT TRUNCATED AT 250 WORDS)
Recombination Can Initiate and Terminate at a Large Number of Sites within the Rosy Locus of Drosophila Melanogaster

PubMed Central

Clark, S. H.; Hilliker, A. J.; Chovnick, A.

1988-01-01

This report presents the results of a recombination experiment designed to question the existence of special sites for the initiation or termination of a recombination heteroduplex within the region of the rosy locus. Intragenic recombination events were monitored between two physically separated rosy mutant alleles ry(301) and ry(2) utilizing DNA restriction site polymorphisms as genetic markers. Both ry(301) and ry(2) are known from previous studies to be associated with gene conversion frequencies an order of magnitude lower than single site mutations. The mutations are associated with large, well defined insertions located as internal sites within the locus in prior intragenic mapping studies. On the molecular map, they represent large insertions approximately 2.7 kb apart in the second and third exons, respectively, of the XDH coding region. The present study monitors intragenic recombination in a mutant heterozygous genotype in which DNA homology is disrupted by these large discontinuities, greater than the region of DNA homology and flanking both sides of the locus. If initiation/or termination requires separate sites at either end of the locus, then intragenic recombination within the rosy locus of the heterozygote should be eliminated. Contrary to expectation, significant recombination between these sites is seen. PMID:2834266
Mitochondrial analysis of a Byzantine population reveals the differential impact of multiple historical events in South Anatolia

PubMed Central

Ottoni, Claudio; Ricaut, François-X; Vanderheyden, Nancy; Brucato, Nicolas; Waelkens, Marc; Decorte, Ronny

2011-01-01

The archaeological site of Sagalassos is located in Southwest Turkey, in the western part of the Taurus mountain range. Human occupation of its territory is attested from the late 12th millennium BP up to the 13th century AD. By analysing the mtDNA variation in 85 skeletons from Sagalassos dated to the 11th–13th century AD, this study attempts to reconstruct the genetic signature potentially left in this region of Anatolia by the many civilizations, which succeeded one another over the centuries until the mid-Byzantine period (13th century BC). Authentic ancient DNA data were determined from the control region and some SNPs in the coding region of the mtDNA in 53 individuals. Comparative analyses with up to 157 modern populations allowed us to reconstruct the origin of the mid-Byzantine people still dwelling in dispersed hamlets in Sagalassos, and to detect the maternal contribution of their potential ancestors. By integrating the genetic data with historical and archaeological information, we were able to attest in Sagalassos a significant maternal genetic signature of Balkan/Greek populations, as well as ancient Persians and populations from the Italian peninsula. Some contribution from the Levant has been also detected, whereas no contribution from Central Asian population could be ascertained. PMID:21224890
Mitochondrial genome of the endangered marine gastropod Strombus gigas Linnaeus, 1758 (Mollusca: Gastropoda).

PubMed

Márquez, Edna J; Castro, Erick R; Alzate, Juan F

2016-01-01

The queen conch Strombus gigas is an endangered marine gastropod of significant economic importance across the Greater Caribbean region. This work reports for the first time the complete mitochondrial genome of S. gigas, obtained by FLX 454 pyrosequencing. The mtDNA genome encodes for 13 proteins, 22 tRNAs and 2 ribosomal RNAs. In addition, the coding sequences and gene synteny were similar to other previously reported mitogenomes of gastropods.
Protein Kinases in Mammary Gland Development and Carcinogenesis

DTIC Science & Technology

1999-09-01

studies identical at the amino acid level to calcium/calmodulin-dependent may provide insight into mechanisms of growth control and DNA protein kinase I...human homologues of these kinases(19, 20 ). Amino acid conservation in the coding region between mouse and human Hunk is greater than 90% identical. While...genes (13, 14). Over the past 4 years , several of the mRNA and protein levels (39-46). These findings clearly dem- these breast cancer susceptibility

Chimeric classical swine fever (CSF)-Japanese encephalitis (JE) viral particles as a non-transmissible bivalent marker vaccine candidate against CSF and JE infections

USDA-ARS?s Scientific Manuscript database

A trans-complemented CSF- JE chimeric viral replicon was constructed using an infectious cDNA clone of the CSF virus (CSFV) Alfort/187 strain. The E2 gene of CSFV Alfort/187 strain was deleted and the resultant plasmid pA187delE2 was inserted by a fragment containing the region coding for a truncate...
Multiplexed SNP typing of ancient DNA clarifies the origin of Andaman mtDNA haplogroups amongst South Asian tribal populations.

PubMed

Endicott, Phillip; Metspalu, Mait; Stringer, Chris; Macaulay, Vincent; Cooper, Alan; Sanchez, Juan J

2006-12-20

The issue of errors in genetic data sets is of growing concern, particularly in population genetics where whole genome mtDNA sequence data is coming under increased scrutiny. Multiplexed PCR reactions, combined with SNP typing, are currently under-exploited in this context, but have the potential to genotype whole populations rapidly and accurately, significantly reducing the amount of errors appearing in published data sets. To show the sensitivity of this technique for screening mtDNA genomic sequence data, 20 historic samples of the enigmatic Andaman Islanders and 12 modern samples from three Indian tribal populations (Chenchu, Lambadi and Lodha) were genotyped for 20 coding region sites after provisional haplogroup assignment with control region sequences. The genotype data from the historic samples significantly revise the topologies for the Andaman M31 and M32 mtDNA lineages by rectifying conflicts in published data sets. The new Indian data extend the distribution of the M31a lineage to South Asia, challenging previous interpretations of mtDNA phylogeography. This genetic connection between the ancestors of the Andamanese and South Asian tribal groups approximately 30 kya has important implications for the debate concerning migration routes and settlement patterns of humans leaving Africa during the late Pleistocene, and indicates the need for more detailed genotyping strategies. The methodology serves as a low-cost, high-throughput model for the production and authentication of data from modern or ancient DNA, and demonstrates the value of museum collections as important records of human genetic diversity.
Multiplexed SNP Typing of Ancient DNA Clarifies the Origin of Andaman mtDNA Haplogroups amongst South Asian Tribal Populations

PubMed Central

Endicott, Phillip; Metspalu, Mait; Stringer, Chris; Macaulay, Vincent; Cooper, Alan; Sanchez, Juan J.

2006-01-01

The issue of errors in genetic data sets is of growing concern, particularly in population genetics where whole genome mtDNA sequence data is coming under increased scrutiny. Multiplexed PCR reactions, combined with SNP typing, are currently under-exploited in this context, but have the potential to genotype whole populations rapidly and accurately, significantly reducing the amount of errors appearing in published data sets. To show the sensitivity of this technique for screening mtDNA genomic sequence data, 20 historic samples of the enigmatic Andaman Islanders and 12 modern samples from three Indian tribal populations (Chenchu, Lambadi and Lodha) were genotyped for 20 coding region sites after provisional haplogroup assignment with control region sequences. The genotype data from the historic samples significantly revise the topologies for the Andaman M31 and M32 mtDNA lineages by rectifying conflicts in published data sets. The new Indian data extend the distribution of the M31a lineage to South Asia, challenging previous interpretations of mtDNA phylogeography. This genetic connection between the ancestors of the Andamanese and South Asian tribal groups ∼30 kya has important implications for the debate concerning migration routes and settlement patterns of humans leaving Africa during the late Pleistocene, and indicates the need for more detailed genotyping strategies. The methodology serves as a low-cost, high-throughput model for the production and authentication of data from modern or ancient DNA, and demonstrates the value of museum collections as important records of human genetic diversity. PMID:17218991
Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.

PubMed

Kanai, T H; Ueda, S; Nakamura, T

2000-01-31

Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).
Comparison of the complete mitochondrial genome of the stonefly Sweltsa longistyla (Plecoptera: Chloroperlidae) with mitogenomes of three other stoneflies.

PubMed

Chen, Zhi-Teng; Du, Yu-Zhou

2015-03-01

The complete mitochondrial genome of the stonefly, Sweltsa longistyla Wu (Plecoptera: Chloroperlidae), was sequenced in this study. The mitogenome of S. longistyla is 16,151bp and contains 37 genes including 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a large non-coding region. S. longistyla, Pteronarcys princeps Banks, Kamimuria wangi Du and Cryptoperla stilifera Sivec belong to the Plecoptera, and the gene order and orientation of their mitogenomes were similar. The overall AT content for the four stoneflies was below 72%, and the AT content of tRNA genes was above 69%. The four genomes were compact and contained only 65-127bp of non-coding intergenic DNAs. Overlapping nucleotides existed in all four genomes and ranged from 24 (P. princeps) to 178bp (K. wangi). There was a 7-bp motif ('ATGATAA') of overlapping DNA and an 8-bp motif (AAGCCTTA) conserved in three stonefly species (P. princeps, K. wangi and C. stilifera). The control regions of four stoneflies contained a stem-loop structure. Four conserved sequence blocks (CSBs) were present in the A+T-rich regions of all four stoneflies. Copyright © 2014 Elsevier B.V. All rights reserved.
Facile and High-Throughput Synthesis of Functional Microparticles with Quick Response Codes.

PubMed

Ramirez, Lisa Marie S; He, Muhan; Mailloux, Shay; George, Justin; Wang, Jun

2016-06-01

Encoded microparticles are high demand in multiplexed assays and labeling. However, the current methods for the synthesis and coding of microparticles either lack robustness and reliability, or possess limited coding capacity. Here, a massive coding of dissociated elements (MiCODE) technology based on innovation of a chemically reactive off-stoichimetry thiol-allyl photocurable polymer and standard lithography to produce a large number of quick response (QR) code microparticles is introduced. The coding process is performed by photobleaching the QR code patterns on microparticles when fluorophores are incorporated into the prepolymer formulation. The fabricated encoded microparticles can be released from a substrate without changing their features. Excess thiol functionality on the microparticle surface allows for grafting of amine groups and further DNA probes. A multiplexed assay is demonstrated using the DNA-grafted QR code microparticles. The MiCODE technology is further characterized by showing the incorporation of BODIPY-maleimide (BDP-M) and Nile Red fluorophores for coding and the use of microcontact printing for immobilizing DNA probes on microparticle surfaces. This versatile technology leverages mature lithography facilities for fabrication and thus is amenable to scale-up in the future, with potential applications in bioassays and in labeling consumer products. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mutations Affecting Expression of the rosy Locus in Drosophila melanogaster

PubMed Central

Lee, Chong Sung; Curtis, Daniel; McCarron, Margaret; Love, Carol; Gray, Mark; Bender, Welcome; Chovnick, Arthur

1987-01-01

The rosy locus in Drosophila melanogaster codes for the enzyme xanthine dehydrogenase (XDH). Previous studies defined a "control element" near the 5' end of the gene, where variant sites affected the amount of rosy mRNA and protein produced. We have determined the DNA sequence of this region from both genomic and cDNA clones, and from the ry+10 underproducer strain. This variant strain had many sequence differences, so that the site of the regulatory change could not be fixed. A mutagenesis was also undertaken to isolate new regulatory mutations. We induced 376 new mutations with 1-ethyl-1-nitrosourea (ENU) and screened them to isolate those that reduced the amount of XDH protein produced, but did not change the properties of the enzyme. Genetic mapping was used to find mutations located near the 5' end of the gene. DNA from each of seven mutants was cloned and sequenced through the 5' region. Mutant base changes were identified in all seven; they appear to affect splicing and translation of the rosy mRNA. In a related study (T. P. Keith et al. 1987), the genomic and cDNA sequences are extended through the 3' end of the gene; the combined sequences define the processing pattern of the rosy transcript and predict the amino acid sequence of XDH. PMID:3036645
Molecular organization of the 5S rDNA gene type II in elasmobranchs.

PubMed

Castro, Sergio I; Hleap, Jose S; Cárdenas, Heiber; Blouin, Christian

2016-01-01

The 5S rDNA gene is a non-coding RNA that can be found in 2 copies (type I and type II) in bony and cartilaginous fish. Previous studies have pointed out that type II gene is a paralog derived from type I. We analyzed the molecular organization of 5S rDNA type II in elasmobranchs. Although the structure of the 5S rDNA is supposed to be highly conserved, our results show that the secondary structure in this group possesses some variability and is different than the consensus secondary structure. One of these differences in Selachii is an internal loop at nucleotides 7 and 112. These mutations observed in the transcribed region suggest an independent origin of the gene among Batoids and Selachii. All promoters were highly conserved with the exception of BoxA, possibly due to its affinity to polymerase III. This latter enzyme recognizes a dT4 sequence as stop signal, however in Rajiformes this signal was doubled in length to dT8. This could be an adaptation toward a higher efficiency in the termination process. Our results suggest that there is no TATA box in elasmobranchs in the NTS region. We also provide some evidence suggesting that the complexity of the microsatellites present in the NTS region play an important role in the 5S rRNA gene since it is significantly correlated with the length of the NTS.
Histone- and protamine-DNA association: conservation of different patterns within the beta-globin domain in human sperm.

PubMed

Gardiner-Garden, M; Ballesteros, M; Gordon, M; Tam, P P

1998-06-01

Most DNA in human sperm is bound to highly basic proteins called protamines, but a small proportion is complexed with histones similar to those found in active chromatin. This raises the intriguing possibility that histones in sperm are marking sets of genes that will be preferentially activated during early development. We have examined the chromatin structure of members of the beta-globin gene family, which are expressed at different times in development, and the protamine 2 gene, which is expressed in spermatids prior to the widespread displacement of histones by transition proteins. The genes coding for epsilon and gamma globin, which are active in the embryonic yolk sac, contain regions which are histone associated in the sperm. No histone-associated regions are present at the sites tested within the beta- and delta-globin genes which are silent in the embryonic yolk sac. The trends of histone or protamine association are consistent for samples from the same person, and no significant between-subject variations in these trends are found for 13 of the 15 fragments analyzed in the two donors. The results suggest that sperm chromatin structures are generally similar in different men but that the length of the histone-associated regions can vary. The association of sperm DNA with histones or protamines sometimes changes within as little as 400 bp of DNA, suggesting that there is fine control over the retention of histones.
Molecular organization of the 5S rDNA gene type II in elasmobranchs

PubMed Central

Castro, Sergio I.; Hleap, Jose S.; Cárdenas, Heiber; Blouin, Christian

2016-01-01

ABSTRACT The 5S rDNA gene is a non-coding RNA that can be found in 2 copies (type I and type II) in bony and cartilaginous fish. Previous studies have pointed out that type II gene is a paralog derived from type I. We analyzed the molecular organization of 5S rDNA type II in elasmobranchs. Although the structure of the 5S rDNA is supposed to be highly conserved, our results show that the secondary structure in this group possesses some variability and is different than the consensus secondary structure. One of these differences in Selachii is an internal loop at nucleotides 7 and 112. These mutations observed in the transcribed region suggest an independent origin of the gene among Batoids and Selachii. All promoters were highly conserved with the exception of BoxA, possibly due to its affinity to polymerase III. This latter enzyme recognizes a dT4 sequence as stop signal, however in Rajiformes this signal was doubled in length to dT8. This could be an adaptation toward a higher efficiency in the termination process. Our results suggest that there is no TATA box in elasmobranchs in the NTS region. We also provide some evidence suggesting that the complexity of the microsatellites present in the NTS region play an important role in the 5S rRNA gene since it is significantly correlated with the length of the NTS. PMID:26488198
Brain cDNA clone for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

McTiernan, C.; Adkins, S.; Chatonnet, A.

1987-10-01

A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
GUI to Facilitate Research on Biological Damage from Radiation

NASA Technical Reports Server (NTRS)

Cucinotta, Frances A.; Ponomarev, Artem Lvovich

2010-01-01

A graphical-user-interface (GUI) computer program has been developed to facilitate research on the damage caused by highly energetic particles and photons impinging on living organisms. The program brings together, into one computational workspace, computer codes that have been developed over the years, plus codes that will be developed during the foreseeable future, to address diverse aspects of radiation damage. These include codes that implement radiation-track models, codes for biophysical models of breakage of deoxyribonucleic acid (DNA) by radiation, pattern-recognition programs for extracting quantitative information from biological assays, and image-processing programs that aid visualization of DNA breaks. The radiation-track models are based on transport models of interactions of radiation with matter and solution of the Boltzmann transport equation by use of both theoretical and numerical models. The biophysical models of breakage of DNA by radiation include biopolymer coarse-grained and atomistic models of DNA, stochastic- process models of deposition of energy, and Markov-based probabilistic models of placement of double-strand breaks in DNA. The program is designed for use in the NT, 95, 98, 2000, ME, and XP variants of the Windows operating system.
Juvenile Leigh syndrome, optic atrophy, ataxia, dystonia, and epilepsy due to T14487C mutation in the mtDNA-ND6 gene: a mitochondrial syndrome presenting from birth to adolescence.

PubMed

Leshinsky-Silver, Esther; Shuvalov, Ruslan; Inbar, Shani; Cohen, Sarit; Lev, Dorit; Lerman-Sagie, Tally

2011-04-01

An increasing number of reports describe mutations in mitochondrial DNA coding regions, especially in mitochondrial DNA- encoded nicotinamide adenine dinucleotide dehydrogenase subunit genes of the respiratory chain complex I, as causing early-onset Leigh syndrome. The authors report the molecular findings in a 24-year-old patient with juvenile-onset Leigh syndrome presenting with optic atrophy, ataxia dystonia, and epilepsy. A brain magnetic resonance imaging revealed bilateral basal ganglia and thalamic hypointensities, and a magnetic resonance spectroscopy revealed an increased lactate peak. The authors identified a T14487C change causing M63V substitution in the mitochondrial ND6 gene. The mutation was heteroplasmic in muscle and blood samples, with different mutation loads, and was absent in the patient's mother's urine and blood samples. They suggest that the T14487C mtDNA mutation should be analyzed in Leigh syndrome, presenting with optic atrophy, ataxia, dystonia, and epilepsy, regardless of age.
DNA: Polymer and molecular code

NASA Astrophysics Data System (ADS)

Shivashankar, G. V.

1999-10-01

The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes gene expression a prime example of a biological code. We developed a novel method of making DNA micro- arrays, the so-called DNA chip. Using the optical tweezer concept, we were able to pattern biomolecules on a solid substrate, developing a new type of sub-micron laser lithography. A laser beam is focused onto a thin gold film on a glass substrate. Laser ablation of gold results in local aggregation of nanometer scale beads conjugated with small DNA oligonucleotides, with sub-micron resolution. This leads to specific detection of cDNA and RNA molecules. We built a simple micro-array fabrication and detection in the laboratory, based on this method, to probe addressable pools (genes, proteins or antibodies). We have lately used molecular beacons (single stranded DNA with a stem-loop structure containing a fluorophore and quencher), for the direct detection of unlabelled mRNA. As a first step towards a study of the dynamics of the biological code, we have begun to examine the patterns of gene expression during virus (T7 phage) infection of E-coli bacteria.
Topological events in single molecules of E. coli DNA confined in nanochannels

PubMed Central

Reifenberger, Jeffrey G.; Dorfman, Kevin D.; Cao, Han

2015-01-01

We present experimental data concerning potential topological events such as folds, internal backfolds, and/or knots within long molecules of double-stranded DNA when they are stretched by confinement in a nanochannel. Genomic DNA from E. coli was labeled near the ‘GCTCTTC’ sequence with a fluorescently labeled dUTP analog and stained with the DNA intercalator YOYO. Individual long molecules of DNA were then linearized and imaged using methods based on the NanoChannel Array technology (Irys® System) available from BioNano Genomics. Data were collected on 189,153 molecules of length greater than 50 kilobases. A custom code was developed to search for abnormal intensity spikes in the YOYO backbone profile along the length of individual molecules. By correlating the YOYO intensity spikes with the aligned barcode pattern to the reference, we were able to correlate the bright intensity regions of YOYO with abnormal stretching in the molecule, which suggests these events were either a knot or a region of internal backfolding within the DNA. We interpret the results of our experiments involving molecules exceeding 50 kilobases in the context of existing simulation data for relatively short DNA, typically several kilobases. The frequency of these events is lower than the predictions from simulations, while the size of the events is larger than simulation predictions and often exceeds the molecular weight of the simulated molecules. We also identified DNA molecules that exhibit large, single folds as they enter the nanochannels. Overall, topological events occur at a low frequency (~7% of all molecules) and pose an easily surmountable obstacle for the practice of genome mapping in nanochannels. PMID:25991508
Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays

PubMed Central

2011-01-01

Background With lower manufacturing cost, high spot density, and flexible probe design, genomic tiling microarrays are ideal for comprehensive transcriptome studies. Typically, transcriptome profiling using microarrays involves reverse transcription, which converts RNA to cDNA. The cDNA is then labeled and hybridized to the probes on the arrays, thus the RNA signals are detected indirectly. Reverse transcription is known to generate artifactual cDNA, in particular the synthesis of second-strand cDNA, leading to false discovery of antisense RNA. To address this issue, we have developed an effective method using RNA that is directly labeled, thus by-passing the cDNA generation. This paper describes this method and its application to the mapping of transcriptome profiles. Results RNA extracted from laboratory cultures of Porphyromonas gingivalis was fluorescently labeled with an alkylation reagent and hybridized directly to probes on genomic tiling microarrays specifically designed for this periodontal pathogen. The generated transcriptome profile was strand-specific and produced signals close to background level in most antisense regions of the genome. In contrast, high levels of signal were detected in the antisense regions when the hybridization was done with cDNA. Five antisense areas were tested with independent strand-specific RT-PCR and none to negligible amplification was detected, indicating that the strong antisense cDNA signals were experimental artifacts. Conclusions An efficient method was developed for mapping transcriptome profiles specific to both coding strands of a bacterial genome. This method chemically labels and uses extracted RNA directly in microarray hybridization. The generated transcriptome profile was free of cDNA artifactual signals. In addition, this method requires fewer processing steps and is potentially more sensitive in detecting small amount of RNA compared to conventional end-labeling methods due to the incorporation of more fluorescent molecules per RNA fragment. PMID:21235785
The DNA Methylome of Human Peripheral Blood Mononuclear Cells

PubMed Central

Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing

2010-01-01

DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693
A gene delivery system for insect cells mediated by arginine-rich cell-penetrating peptides.

PubMed

Chen, Yung-Jen; Liu, Betty Revon; Dai, Yun-Hao; Lee, Cheng-Yi; Chan, Ming-Huan; Chen, Hwei-Hsien; Chiang, Huey-Jenn; Lee, Han-Jung

2012-02-10

Most bioactive macromolecules, such as protein, DNA and RNA, basically cannot permeate into cells freely from outside the plasma membrane. Cell-penetrating peptides (CPPs) are a group of short peptides that possess the ability to traverse the cell membrane and have been considered as candidates for mediating gene and drug delivery into living cells. In this study, we demonstrate that three arginine-rich CPPs (SR9, HR9 and PR9) are able to form stable complexes with plasmid DNA and deliver DNA into insect Sf9 cells in a noncovalent manner. The transferred plasmid DNA containing enhanced green fluorescent protein (EGFP) and red fluorescent protein (RFP) coding regions could be expressed in cells functionally assayed at both the protein and RNA levels. Furthermore, treatment of cells with CPPs and CPP/DNA complexes resulted in a viability of 84-93% indicating these CPPs are not cytotoxic. These results suggest that arginine-rich CPPs appear to be a promising tool for insect transgenesis. Copyright © 2011 Elsevier B.V. All rights reserved.
The fission yeast CENP-B protein Abp1 prevents pervasive transcription of repetitive DNA elements.

PubMed

Daulny, Anne; Mejía-Ramírez, Eva; Reina, Oscar; Rosado-Lugo, Jesus; Aguilar-Arnal, Lorena; Auer, Herbert; Zaratiegui, Mikel; Azorin, Fernando

2016-10-01

It is well established that eukaryotic genomes are pervasively transcribed producing cryptic unstable transcripts (CUTs). However, the mechanisms regulating pervasive transcription are not well understood. Here, we report that the fission yeast CENP-B homolog Abp1 plays an important role in preventing pervasive transcription. We show that loss of abp1 results in the accumulation of CUTs, which are targeted for degradation by the exosome pathway. These CUTs originate from different types of genomic features, but the highest increase corresponds to Tf2 retrotransposons and rDNA repeats, where they map along the entire elements. In the absence of abp1, increased RNAPII-Ser5P occupancy is observed throughout the Tf2 coding region and, unexpectedly, RNAPII-Ser5P is enriched at rDNA repeats. Loss of abp1 also results in Tf2 derepression and increased nucleolus size. Altogether these results suggest that Abp1 prevents pervasive RNAPII transcription of repetitive DNA elements (i.e., Tf2 and rDNA repeats) from internal cryptic sites. Copyright © 2016 Elsevier B.V. All rights reserved.
Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

PubMed

King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

2014-01-01

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

A new polymorphic and multicopy MHC gene family related to nonmammalian class I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

1994-12-31

The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less
New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

PubMed Central

Černý, Viktor; Carracedo, Ángel

2011-01-01

Background Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. Methodology/Principal Findings Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. Conclusions/Significance Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mtSNPs at high resolution could be a fast and extensive approach for screening variation in population studies where labor-intensive techniques such as entire genome sequencing remain unfeasible. PMID:21533064
Mutational analysis of the myelin protein zero (MPZ) gene associated with Charcot-Marie-Tooth neuropathy type 1B

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roa, B.B.; Warner, L.E.; Lupski, J.R.

1994-09-01

The MPZ gene that maps to chromosome 1q22q23 encodes myelin protein zero, which is the most abundant peripheral nerve myelin protein that functions as a homophilic adhesion molecule in myelin compaction. Association of the MPZ gene with the dysmyelinating peripheral neuropathies Charcot-Marie-Tooth disease type 1B (CMT1B) and the more severe Dejerine-Sottas syndrome (DSS) was previously demonstrated by MPZ mutations identified in CMT1B and in rare DSS patients. In this study, the coding region of the MPZ gene was screened for mutations in a cohort of 74 unrelated patients with either CMT type 1 or DSS who do not carry themore » most common CMT1-associated molecular lesion of a 1.5 Mb DNA duplication on 17p11.2-p12. Heteroduplex analysis detected base mismatches in ten patients that were distributed over three exons of MPZ. Direct sequencing of PCR-amplified genomic DNA identified a de novo MPZ mutation associated with CMT1B that predicts an Ile(135)Thr substitution. This finding further confirms the role of MPZ in the CMT1B disease process. In addition, two polymorphisms were identified within the Gly(200) and Ser(228) codons that do not alter the respective amino acid residues. A fourth base mismatch in MPZ exon 3 detected by heteroduplex analysis is currently being characterized by direct sequence determination. Previously, four unrelated patients in this same cohort were found to have unique point mutations in the coding region of the PMP22 gene. The collective findings on CMT1 point mutations could suggest that regulatory region mutations, and possibly mutations in CMT gene(s) apart from the MPZ, PMP22 and Cx32 genes identified thus far, may prove to be significant for a number of CMT1 cases that do not involve DNA duplication.« less
Changes in the Coding and Non-coding Transcriptome and DNA Methylome that Define the Schwann Cell Repair Phenotype after Nerve Injury.

PubMed

Arthur-Farraj, Peter J; Morgan, Claire C; Adamowicz, Martyna; Gomez-Sanchez, Jose A; Fazal, Shaline V; Beucher, Anthony; Razzaghi, Bonnie; Mirsky, Rhona; Jessen, Kristjan R; Aitman, Timothy J

2017-09-12

Repair Schwann cells play a critical role in orchestrating nerve repair after injury, but the cellular and molecular processes that generate them are poorly understood. Here, we perform a combined whole-genome, coding and non-coding RNA and CpG methylation study following nerve injury. We show that genes involved in the epithelial-mesenchymal transition are enriched in repair cells, and we identify several long non-coding RNAs in Schwann cells. We demonstrate that the AP-1 transcription factor C-JUN regulates the expression of certain micro RNAs in repair Schwann cells, in particular miR-21 and miR-34. Surprisingly, unlike during development, changes in CpG methylation are limited in injury, restricted to specific locations, such as enhancer regions of Schwann cell-specific genes (e.g., Nedd4l), and close to local enrichment of AP-1 motifs. These genetic and epigenomic changes broaden our mechanistic understanding of the formation of repair Schwann cell during peripheral nervous system tissue repair. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Purification and functional characterization of p16, the ATPase of the bacteriophage Φ29 packaging machinery

PubMed Central

Ibarra, Borja; Valpuesta, José María; Carrascosa, José L.

2001-01-01

Bacteriophage Φ29 codes for a protein (p16) that is required for viral DNA packaging both in vivo and in vitro. Co-expression of p16 with the chaperonins GroEL and GroES has allowed its purification in a soluble form. Purified p16 shows a weak ATPase activity that is stimulated by either DNA or RNA, irrespective of the presence of any other viral component. The stimulation of ATPase activity of p16, although induced under packaging conditions, is not dependent of the actual DNA packaging and in this respect the Φ29 enzyme is similar to other viral terminases. Protein p16 competes with DNA and RNA in the interaction with the viral prohead, which occurs through the N-terminal region of the connector protein (p10). In fact, p16 interacts in a nucleotide-dependent fashion with the viral Φ29-encoded RNA (pRNA) involved in DNA packaging, and this binding can be competed with DNA. Our results are consistent with a model for DNA translocation in which p16, bound and organized around the connector, acts as a power stroke to pump the DNA into the prohead, using the hydrolysis of ATP as an energy source. PMID:11691914
Purification and functional characterization of p16, the ATPase of the bacteriophage Phi29 packaging machinery.

PubMed

Ibarra, B; Valpuesta, J M; Carrascosa, J L

2001-11-01

Bacteriophage Phi29 codes for a protein (p16) that is required for viral DNA packaging both in vivo and in vitro. Co-expression of p16 with the chaperonins GroEL and GroES has allowed its purification in a soluble form. Purified p16 shows a weak ATPase activity that is stimulated by either DNA or RNA, irrespective of the presence of any other viral component. The stimulation of ATPase activity of p16, although induced under packaging conditions, is not dependent of the actual DNA packaging and in this respect the Phi29 enzyme is similar to other viral terminases. Protein p16 competes with DNA and RNA in the interaction with the viral prohead, which occurs through the N-terminal region of the connector protein (p10). In fact, p16 interacts in a nucleotide-dependent fashion with the viral Phi29-encoded RNA (pRNA) involved in DNA packaging, and this binding can be competed with DNA. Our results are consistent with a model for DNA translocation in which p16, bound and organized around the connector, acts as a power stroke to pump the DNA into the prohead, using the hydrolysis of ATP as an energy source.
Coarse-grained simulation of DNA using LAMMPS : An implementation of the oxDNA model and its applications.

PubMed

Henrich, Oliver; Gutiérrez Fosado, Yair Augusto; Curk, Tine; Ouldridge, Thomas E

2018-05-10

During the last decade coarse-grained nucleotide models have emerged that allow us to study DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.
PCR-free quantitative detection of genetically modified organism from raw materials. An electrochemiluminescence-based bio bar code method.

PubMed

Zhu, Debin; Tang, Yabing; Xing, Da; Chen, Wei R

2008-05-15

A bio bar code assay based on oligonucleotide-modified gold nanoparticles (Au-NPs) provides a PCR-free method for quantitative detection of nucleic acid targets. However, the current bio bar code assay requires lengthy experimental procedures including the preparation and release of bar code DNA probes from the target-nanoparticle complex and immobilization and hybridization of the probes for quantification. Herein, we report a novel PCR-free electrochemiluminescence (ECL)-based bio bar code assay for the quantitative detection of genetically modified organism (GMO) from raw materials. It consists of tris-(2,2'-bipyridyl) ruthenium (TBR)-labeled bar code DNA, nucleic acid hybridization using Au-NPs and biotin-labeled probes, and selective capture of the hybridization complex by streptavidin-coated paramagnetic beads. The detection of target DNA is realized by direct measurement of ECL emission of TBR. It can quantitatively detect target nucleic acids with high speed and sensitivity. This method can be used to quantitatively detect GMO fragments from real GMO products.
Low-energy electron dose-point kernel simulations using new physics models implemented in Geant4-DNA

NASA Astrophysics Data System (ADS)

Bordes, Julien; Incerti, Sébastien; Lampe, Nathanael; Bardiès, Manuel; Bordage, Marie-Claude

2017-05-01

When low-energy electrons, such as Auger electrons, interact with liquid water, they induce highly localized ionizing energy depositions over ranges comparable to cell diameters. Monte Carlo track structure (MCTS) codes are suitable tools for performing dosimetry at this level. One of the main MCTS codes, Geant4-DNA, is equipped with only two sets of cross section models for low-energy electron interactions in liquid water (;option 2; and its improved version, ;option 4;). To provide Geant4-DNA users with new alternative physics models, a set of cross sections, extracted from CPA100 MCTS code, have been added to Geant4-DNA. This new version is hereafter referred to as ;Geant4-DNA-CPA100;. In this study, ;Geant4-DNA-CPA100; was used to calculate low-energy electron dose-point kernels (DPKs) between 1 keV and 200 keV. Such kernels represent the radial energy deposited by an isotropic point source, a parameter that is useful for dosimetry calculations in nuclear medicine. In order to assess the influence of different physics models on DPK calculations, DPKs were calculated using the existing Geant4-DNA models (;option 2; and ;option 4;), newly integrated CPA100 models, and the PENELOPE Monte Carlo code used in step-by-step mode for monoenergetic electrons. Additionally, a comparison was performed of two sets of DPKs that were simulated with ;Geant4-DNA-CPA100; - the first set using Geant4‧s default settings, and the second using CPA100‧s original code default settings. A maximum difference of 9.4% was found between the Geant4-DNA-CPA100 and PENELOPE DPKs. Between the two Geant4-DNA existing models, slight differences, between 1 keV and 10 keV were observed. It was highlighted that the DPKs simulated with the two Geant4-DNA's existing models were always broader than those generated with ;Geant4-DNA-CPA100;. The discrepancies observed between the DPKs generated using Geant4-DNA's existing models and ;Geant4-DNA-CPA100; were caused solely by their different cross sections. The different scoring and interpolation methods used in CPA100 and Geant4 to calculate DPKs showed differences close to 3.0% near the source.
Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.

PubMed

Just, Rebecca S; Scheible, Melissa K; Fast, Spence A; Sturk-Andreaggi, Kimberly; Higginbotham, Jennifer L; Lyons, Elizabeth A; Bush, Jocelyn M; Peck, Michelle A; Ring, Joseph D; Diegoli, Toni M; Röck, Alexander W; Huber, Gabriela E; Nagl, Simone; Strobl, Christina; Zimmermann, Bettina; Parson, Walther; Irwin, Jodi A

2014-05-01

Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition.

PubMed

Rangannan, Vetriselvi; Bansal, Manju

2009-12-01

The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool PromPredict. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
BuD, a helix–loop–helix DNA-binding domain for genome modification

PubMed Central

Stella, Stefano; Molina, Rafael; López-Méndez, Blanca; Juillerat, Alexandre; Bertonati, Claudia; Daboussi, Fayza; Campos-Olivas, Ramon; Duchateau, Phillippe; Montoya, Guillermo

2014-01-01

DNA editing offers new possibilities in synthetic biology and biomedicine for modulation or modification of cellular functions to organisms. However, inaccuracy in this process may lead to genome damage. To address this important problem, a strategy allowing specific gene modification has been achieved through the addition, removal or exchange of DNA sequences using customized proteins and the endogenous DNA-repair machinery. Therefore, the engineering of specific protein–DNA interactions in protein scaffolds is key to providing ‘toolkits’ for precise genome modification or regulation of gene expression. In a search for putative DNA-binding domains, BurrH, a protein that recognizes a 19 bp DNA target, was identified. Here, its apo and DNA-bound crystal structures are reported, revealing a central region containing 19 repeats of a helix–loop–helix modular domain (BurrH domain; BuD), which identifies the DNA target by a single residue-to-nucleotide code, thus facilitating its redesign for gene targeting. New DNA-binding specificities have been engineered in this template, showing that BuD-derived nucleases (BuDNs) induce high levels of gene targeting in a locus of the human haemoglobin β (HBB) gene close to mutations responsible for sickle-cell anaemia. Hence, the unique combination of high efficiency and specificity of the BuD arrays can push forward diverse genome-modification approaches for cell or organism redesign, opening new avenues for gene editing. PMID:25004980
Regions of extreme synonymous codon selection in mammalian genes

PubMed Central

Schattner, Peter; Diekhans, Mark

2006-01-01

Recently there has been increasing evidence that purifying selection occurs among synonymous codons in mammalian genes. This selection appears to be a consequence of either cis-regulatory motifs, such as exonic splicing enhancers (ESEs), or mRNA secondary structures, being superimposed on the coding sequence of the gene. We have developed a program to identify regions likely to be enriched for such motifs by searching for extended regions of extreme codon conservation between homologous genes of related species. Here we present the results of applying this approach to five mammalian species (human, chimpanzee, mouse, rat and dog). Even with very conservative selection criteria, we find over 200 regions of extreme codon conservation, ranging in length from 60 to 178 codons. The regions are often found within genes involved in DNA-binding, RNA-binding or zinc-ion-binding. They are highly depleted for synonymous single nucleotide polymorphisms (SNPs) but not for non-synonymous SNPs, further indicating that the observed codon conservation is being driven by negative selection. Forty-three percent of the regions overlap conserved alternative transcript isoforms and are enriched for known ESEs. Other regions are enriched for TpA dinucleotides and may contain conserved motifs/structures relating to mRNA stability and/or degradation. We anticipate that this tool will be useful for detecting regions enriched in other classes of coding-sequence motifs and structures as well. PMID:16556911
The 'dark matter' in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin.

PubMed

Jiang, Jiming

2015-04-01

Sequencing of complete plant genomes has become increasingly more routine since the advent of the next-generation sequencing technology. Identification and annotation of large amounts of noncoding but functional DNA sequences, including cis-regulatory DNA elements (CREs), have become a new frontier in plant genome research. Genomic regions containing active CREs bound to regulatory proteins are hypersensitive to DNase I digestion and are called DNase I hypersensitive sites (DHSs). Several recent DHS studies in plants illustrate that DHS datasets produced by DNase I digestion followed by next-generation sequencing (DNase-seq) are highly valuable for the identification and characterization of CREs associated with plant development and responses to environmental cues. DHS-based genomic profiling has opened a door to identify and annotate the 'dark matter' in sequenced plant genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Acute loss of TET function results in aggressive myeloid cancer in mice

PubMed Central

An, Jungeun; González-Avalos, Edahí; Chawla, Ashu; Jeong, Mira; López-Moyado, Isaac F.; Li, Wei; Goodell, Margaret A.; Chavez, Lukas; Ko, Myunggon; Rao, Anjana

2015-01-01

TET-family dioxygenases oxidize 5-methylcytosine (5mC) in DNA, and exert tumour suppressor activity in many types of cancers. Even in the absence of TET coding region mutations, TET loss-of-function is strongly associated with cancer. Here we show that acute elimination of TET function induces the rapid development of an aggressive, fully-penetrant and cell-autonomous myeloid leukaemia in mice, pointing to a causative role for TET loss-of-function in this myeloid malignancy. Phenotypic and transcriptional profiling shows aberrant differentiation of haematopoietic stem/progenitor cells, impaired erythroid and lymphoid differentiation and strong skewing to the myeloid lineage, with only a mild relation to changes in DNA modification. We also observe progressive accumulation of phospho-H2AX and strong impairment of DNA damage repair pathways, suggesting a key role for TET proteins in maintaining genome integrity. PMID:26607761
Complementary DNA characterization and chromosomal localization of a human gene related to the poliovirus receptor-encoding gene.

PubMed

Lopez, M; Eberlé, F; Mattei, M G; Gabert, J; Birg, F; Bardin, F; Maroc, C; Dubreuil, P

1995-04-03

The human poliovirus (PV) receptor (PVR) is a member of the immunoglobulin (Ig) superfamily with unknown cellular function. We have isolated a human PVR-related (PRR) cDNA. The deduced amino acid (aa) sequence of PRR showed, in the extracellular region, 51.7 and 54.3% similarity with human PVR and with the murine PVR homolog, respectively. The cDNA coding sequence is 1.6-kb long and encodes a deduced 57-kDa protein; this protein has a structural organization analogous to that of PVR, that is, one V- and two C-set Ig domains, with a conserved number of aa. Northern blot analysis indicated that a major 5.9-kb transcript is present in all normal human tissues tested. In situ hybridization showed that the PRR gene is located at bands q23-q24 of human chromosome 11.
Sequence of the non-phosphorylating glyceraldehyde-3-phosphate dehydrogenase from Nicotiana plumbaginifolia and phylogenetic origin of the gene family.

PubMed

Habenicht, A; Quesada, A; Cerff, R

1997-10-01

A cDNA-library has been constructed from Nicotiana plumbaginifolia seedlings, and the non-phosphorylating glyceraldehyde-3-phosphate dehydrogenase (GapN, EC 1.2.1.9) was isolated by plaque hybridization using the cDNA from pea as a heterologous probe. The cDNA comprises the entire GapN coding region. A putative polyadenylation signal is identified. Phylogenetic analysis based on the deduced amino acid sequences revealed that the GapN gene family represents a separate ancient branch within the aldehyde dehydrogenase superfamily. It can be shown that the GapN gene family and other distinct branches of the superfamily have its phylogenetic origin before the separation of primary life-forms. This further demonstrates that already very early in evolution, a broad diversification of the aldehyde dehydrogenases led to the formation of the superfamily.
Trichomonas vaginalis ribosomal RNA: identification and characterisation of the transcription promoter and terminator sequences.

PubMed

Franco, Bernardo; Hernández, Roberto; López-Villaseñor, Imelda

2012-09-01

Trichomonas vaginalis is a parasitic protozoan of both medical and biological relevance. Transcriptional studies in this organism have focused mainly on type II pol promoters, whereas the elements necessary for transcription by polI or polIII have not been investigated. Here, with the aid of a transient transcription system, we characterised the rDNA intergenic region, defining both the promoter and the terminator sequences required for transcription. We defined the promoter as a compact region of approximately 180 bp. We also identified a potential upstream control element (UCE) that was located 80 bp upstream of the transcription start point (TSP). A transcription termination element was identified within a 34 bp region that was located immediately downstream of the 28S coding sequence. The function of this element depends upon polarity and the presence of both a stretch of uridine residues (U's) and a hairpin structure in the transcript. Our observations provide a strong basis for the study of DNA recognition by the polI transcriptional machinery in this early divergent organism. Copyright © 2012 Elsevier B.V. All rights reserved.
Analysis of the regulatory region of the protease III (ptr) gene of Escherichia coli K-12.

PubMed

Claverie-Martin, F; Diaz-Torres, M R; Kushner, S R

1987-01-01

The ptr gene of Escherichia coli encodes protease III (Mr 110,000) and a 50-kDa polypeptide, both of which are found in the periplasmic space. The gene is physically located between the recC and recB loci on the E. coli chromosome. The nucleotide sequence of a 1167-bp EcoRV-ClaI fragment of chromosomal DNA containing the promoter region and 885 bp of the ptr coding sequence has been determined. S1 nuclease mapping analysis showed that the major 5' end of the ptr mRNA was localized 127 bp upstream from the ATG start codon. The open reading frame (ORF), preceded by a Shine-Dalgarno sequence, extends to the end of the sequenced DNA. Downstream from the -35 and -10 regions is a sequence that strongly fits the consensus sequence of known nitrogen-regulated promoters. A signal peptide of 23 amino acids residues is present at the N terminus of the derived amino acid sequence. The cleavage site as well as the ORF were confirmed by sequencing the N terminus of mature protease III.
Highly conserved intragenic HSV-2 sequences: Results from next-generation sequencing of HSV-2 UL and US regions from genital swabs collected from 3 continents.

PubMed

Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M

2017-10-01

Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10 7 log 10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (U L _U S ) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated U L_ U S segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.

Molecular phylogeography of the Andean alpine plant, Gunnera magellanica

NASA Astrophysics Data System (ADS)

Shimizu, M.; Fujii, N.; Ito, M.; Asakawa, T.; Nishida, H.; Suyama, C.; Ueda, K.

2015-12-01

To clarify the evolutionary history of Gunnera magellanica (Gunneraceae), an alpine plant of the Andes mountains, we performed molecular phylogeographic analyses based on the sequences of an internal transcribed spacer (ITS) of nuclear ribosomal DNA and four non-coding regions (trnH-psbA, trnL-trnF, atpB-rbcL, rpl16 intron) of chloroplast DNA. We investigated 3, 4, 4 and 11 populations in, Ecuador, Bolivia, Argentina, and Chile, respectively, and detected six ITS genotypes (Types A-F) in G. magellanica. Five genotypes (Types A-E) were observed in the northern Andes population (Ecuador and Bolivia); only one ITS genotype (Type F) was observed in the southern Andes population (Chile and Argentina). Phylogenetic analyses showed that the ITS genotypes of the northern and southern Andes populations form different clades with high bootstrap probability. Furthermore, network analysis, analysis of molecular variance, and spatial analysis of molecular variance showed that there were two major clusters (the northern and southern Andes populations) in this species. Furthermore, in chloroplast DNA analysis, three major clades (northern Andes, Chillan, and southern Andes) were inferred from phylogenetic analyses using four non-coding regions, a finding that was supported by the above three types of analysis. The Chillan clade is the northernmost population in the southern Andes populations. With the exception of the Chillan clade (Chillan population), results of nuclear DNA and chloroplast DNA analyses were consistent. Both markers showed that the northern and southern Andes populations of G. magellanica were genetically different from each other. This type of clear phylogeographical structure was supported by PERMUT analysis according to Pons & Petit (1995, 1996). Moreover, based on our preliminary estimation that is based on the ITS sequences, the northern and southern Andes clades diverged ~0.63-3 million years ago, during a period of upheaval in the Andes. This suggests that the populations of G. magellanica that were distributed along the Andes have been divided into the two local populations of the northern and southern Andes during the uplift of the Andes.
The complete mitochondrial genome and phylogenetic analysis of the giant panda (Ailuropoda melanoleuca).

PubMed

Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong

2007-08-01

The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Mitochondrial genomes of the jungle crow Corvus macrorhynchos (Passeriformes: Corvidae) from shed feathers and a phylogenetic analysis of genus Corvus using mitochondrial protein-coding genes.

PubMed

Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M

2016-07-01

The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Complete mitochondrial DNA genome of bonnethead shark, Sphyrna tiburo, and phylogenetic relationships among main superorders of modern elasmobranchs

PubMed Central

Díaz-Jaimes, Píndaro; Bayona-Vásquez, Natalia J.; Adams, Douglas H.; Uribe-Alcocer, Manuel

2015-01-01

Elasmobranchs are one of the most diverse groups in the marine realm represented by 18 orders, 55 families and about 1200 species reported, but also one of the most vulnerable to exploitation and to climate change. Phylogenetic relationships among main orders have been controversial since the emergence of the Hypnosqualean hypothesis by Shirai (1992) that considered batoids as a sister group of sharks. The use of the complete mitochondrial DNA (mtDNA) may shed light to further validate this hypothesis by increasing the number of informative characters. We report the mtDNA genome of the bonnethead shark Sphyrna tiburo, and compare it with mitogenomes of other 48 species to assess phylogenetic relationships. The mtDNA genome of S. tiburo, is quite similar in size to that of congeneric species but also similar to the reported mtDNA genome of other Carcharhinidae species. Like most vertebrate mitochondrial genomes, it contained 13 protein coding genes, two rRNA genes and 22 tRNA genes and the control region of 1086 bp (D-loop). The Bayesian analysis of the 49 mitogenomes supported the view that sharks and batoids are separate groups. PMID:27014583
Analysis of mitochondrial DNA in Bolivian llama, alpaca and vicuna populations: a contribution to the phylogeny of the South American camelids.

PubMed

Barreta, J; Gutiérrez-Gil, B; Iñiguez, V; Saavedra, V; Chiri, R; Latorre, E; Arranz, J J

2013-04-01

The objectives of this work were to assess the mtDNA diversity of Bolivian South American camelid (SAC) populations and to shed light on the evolutionary relationships between the Bolivian camelids and other populations of SACs. We have analysed two different mtDNA regions: the complete coding region of the MT-CYB gene and 513 bp of the D-loop region. The populations sampled included Bolivian llamas, alpacas and vicunas, and Chilean guanacos. High levels of genetic diversity were observed in the studied populations. In general, MT-CYB was more variable than D-loop. On a species level, the vicunas showed the lowest genetic variability, followed by the guanacos, alpacas and llamas. Phylogenetic analyses performed by including additional available mtDNA sequences from the studied species confirmed the existence of the two monophyletic clades previously described by other authors for guanacos (G) and vicunas (V). Significant levels of mtDNA hybridization were found in the domestic species. Our sequence analyses revealed significant sequence divergence within clade G, and some of the Bolivian llamas grouped with the majority of the southern guanacos. This finding supports the existence of more than the one llama domestication centre in South America previously suggested on the basis of archaeozoological evidence. Additionally, analysis of D-loop sequences revealed two new matrilineal lineages that are distinct from the previously reported G and V clades. The results presented here represent the first report on the population structure and genetic variability of Bolivian camelids and may help to elucidate the complex and dynamic domestication process of SAC populations. © 2012 The Authors, Animal Genetics © 2012 Stichting International Foundation for Animal Genetics.
Uniparental Markers of Contemporary Italian Population Reveals Details on Its Pre-Roman Heritage

PubMed Central

Álvarez-Iglesias, Vanesa; Fondevila, Manuel; Blanco-Verea, Alejandro; Carracedo, Ángel; Pascali, Vincenzo L.; Capelli, Cristian

2012-01-01

Background According to archaeological records and historical documentation, Italy has been a melting point for populations of different geographical and ethnic matrices. Although Italy has been a favorite subject for numerous population genetic studies, genetic patterns have never been analyzed comprehensively, including uniparental and autosomal markers throughout the country. Methods/Principal Findings A total of 583 individuals were sampled from across the Italian Peninsula, from ten distant (if homogeneous by language) ethnic communities — and from two linguistic isolates (Ladins, Grecani Salentini). All samples were first typed for the mitochondrial DNA (mtDNA) control region and selected coding region SNPs (mtSNPs). This data was pooled for analysis with 3,778 mtDNA control-region profiles collected from the literature. Secondly, a set of Y-chromosome SNPs and STRs were also analyzed in 479 individuals together with a panel of autosomal ancestry informative markers (AIMs) from 441 samples. The resulting genetic record reveals clines of genetic frequencies laid according to the latitude slant along continental Italy – probably generated by demographical events dating back to the Neolithic. The Ladins showed distinctive, if more recent structure. The Neolithic contribution was estimated for the Y-chromosome as 14.5% and for mtDNA as 10.5%. Y-chromosome data showed larger differentiation between North, Center and South than mtDNA. AIMs detected a minor sub-Saharan component; this is however higher than for other European non-Mediterranean populations. The same signal of sub-Saharan heritage was also evident in uniparental markers. Conclusions/Significance Italy shows patterns of molecular variation mirroring other European countries, although some heterogeneity exists based on different analysis and molecular markers. From North to South, Italy shows clinal patterns that were most likely modulated during Neolithic times. PMID:23251386
Uniparental markers of contemporary Italian population reveals details on its pre-Roman heritage.

PubMed

Brisighelli, Francesca; Álvarez-Iglesias, Vanesa; Fondevila, Manuel; Blanco-Verea, Alejandro; Carracedo, Angel; Pascali, Vincenzo L; Capelli, Cristian; Salas, Antonio

2012-01-01

According to archaeological records and historical documentation, Italy has been a melting point for populations of different geographical and ethnic matrices. Although Italy has been a favorite subject for numerous population genetic studies, genetic patterns have never been analyzed comprehensively, including uniparental and autosomal markers throughout the country. A total of 583 individuals were sampled from across the Italian Peninsula, from ten distant (if homogeneous by language) ethnic communities--and from two linguistic isolates (Ladins, Grecani Salentini). All samples were first typed for the mitochondrial DNA (mtDNA) control region and selected coding region SNPs (mtSNPs). This data was pooled for analysis with 3,778 mtDNA control-region profiles collected from the literature. Secondly, a set of Y-chromosome SNPs and STRs were also analyzed in 479 individuals together with a panel of autosomal ancestry informative markers (AIMs) from 441 samples. The resulting genetic record reveals clines of genetic frequencies laid according to the latitude slant along continental Italy--probably generated by demographical events dating back to the Neolithic. The Ladins showed distinctive, if more recent structure. The Neolithic contribution was estimated for the Y-chromosome as 14.5% and for mtDNA as 10.5%. Y-chromosome data showed larger differentiation between North, Center and South than mtDNA. AIMs detected a minor sub-Saharan component; this is however higher than for other European non-Mediterranean populations. The same signal of sub-Saharan heritage was also evident in uniparental markers. Italy shows patterns of molecular variation mirroring other European countries, although some heterogeneity exists based on different analysis and molecular markers. From North to South, Italy shows clinal patterns that were most likely modulated during Neolithic times.
Identification and Differentiation of Verticillium Species and V. longisporum Lineages by Simplex and Multiplex PCR Assays

PubMed Central

Inderbitzin, Patrik; Davis, R. Michael; Bostock, Richard M.; Subbarao, Krishna V.

2013-01-01

Accurate species identification is essential for effective plant disease management, but is challenging in fungi including Verticillium sensu stricto (Ascomycota, Sordariomycetes, Plectosphaerellaceae), a small genus of ten species that includes important plant pathogens. Here we present fifteen PCR assays for the identification of all recognized Verticillium species and the three lineages of the diploid hybrid V. longisporum. The assays were based on DNA sequence data from the ribosomal internal transcribed spacer region, and coding and non-coding regions of actin, elongation factor 1-alpha, glyceraldehyde-3-phosphate dehydrogenase and tryptophan synthase genes. The eleven single target (simplex) PCR assays resulted in amplicons of diagnostic size for V. alfalfae, V. albo-atrum, V. dahliae including V. longisporum lineage A1/D3, V. isaacii, V. klebahnii, V. nonalfalfae, V. nubilum, V. tricorpus, V. zaregamsianum, and Species A1 and Species D1, the two undescribed ancestors of V. longisporum. The four multiple target (multiplex) PCR assays simultaneously differentiated the species or lineages within the following four groups: Verticillium albo-atrum, V. alfalfae and V. nonalfalfae; Verticillium dahliae and V. longisporum lineages A1/D1, A1/D2 and A1/D3; Verticillium dahliae including V. longisporum lineage A1/D3, V. isaacii, V. klebahnii and V. tricorpus; Verticillium isaacii, V. klebahnii and V. tricorpus. Since V. dahliae is a parent of two of the three lineages of the diploid hybrid V. longisporum, no simplex PCR assay is able to differentiate V. dahliae from all V. longisporum lineages. PCR assays were tested with fungal DNA extracts from pure cultures, and were not evaluated for detection and quantification of Verticillium species from plant or soil samples. The DNA sequence alignments are provided and can be used for the design of additional primers. PMID:23823707
Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis.

PubMed

Spangler, Jacob B; Feltus, Frank Alex

2013-01-01

Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression.
Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis

PubMed Central

Spangler, Jacob B.; Feltus, Frank Alex

2013-01-01

Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression. PMID:23675377
The flaA locus of Bacillus subtilis is part of a large operon coding for flagellar structures, motility functions, and an ATPase-like polypeptide.

PubMed Central

Albertini, A M; Caramori, T; Crabb, W D; Scoffone, F; Galizzi, A

1991-01-01

We cloned and sequenced 8.3 kb of Bacillus subtilis DNA corresponding to the flaA locus involved in flagellar biosynthesis, motility, and chemotaxis. The DNA sequence revealed the presence of 10 complete and 2 incomplete open reading frames. Comparison of the deduced amino acid sequences to data banks showed similarities of nine of the deduced products to a number of proteins of Escherichia coli and Salmonella typhimurium for which a role in flagellar functioning has been directly demonstrated. In particular, the sequence data suggest that the flaA operon codes for the M-ring protein, components of the motor switch, and the distal part of the basal-body rod. The gene order is remarkably similar to that described for region III of the enterobacterial flagellar regulon. One of the open reading frames was translated into a protein with 48% amino acid identity to S. typhimurium FliI and 29% identity to the beta subunit of E. coli ATP synthase. PMID:1828465
Survival in extreme environment by "preserve-expand-specialize" strategy: lessons from comparative genomics of an anhydrobiotic midge.

NASA Astrophysics Data System (ADS)

Gusev, Oleg; Sugimoto, Manabu; Novikova, Nataliya; Sychev, Vladimir; Okuda, Takashi; Kikawada, Takahiro

2012-07-01

Anhydrobiotic chironomid larvae of Polypedilum vanderplanki (Diptera) can withstand prolonged complete desiccation as well as other external stresses including ionizing radiation. Recent experiments showed that this insect is able to survive long-tern exposure to real outer space. At the same time, we found that dehydration causes alterations in chromatin structure and a severe fragmentation of nuclear DNA in the cells of the larvae despite successful anhydrobiosis. Analysis of several remote populations of the chironomid in Africa that desiccation-related DNA damage might be a driving genetic force for rapid radiation within the species. First results of ongoing genome project suggest that origin and evolution of anhydrobiosis in this single insect species related to rapid duplication of the genes, coding late embryogenesis abundant proteins (LEA) and other molecular agents directly involved in desiccation resistance in the cells. Analysis of genome-wide mRNA expression profiles in the larvae subjected to desiccation shows that joint-activity of large multiple-genes coding regions in the genome involved in control of anhydrobiosis-related molecular adaptations in the chironomid.
The Diversity Present in 5140 Human Mitochondrial Genomes

PubMed Central

Pereira, Luísa; Freitas, Fernando; Fernandes, Verónica; Pereira, Joana B.; Costa, Marta D.; Costa, Stephanie; Máximo, Valdemar; Macaulay, Vincent; Rocha, Ricardo; Samuels, David C.

2009-01-01

We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition. PMID:19426953
Secondary mutation in a coding mononucleotide tract in MSH6 causes loss of immunoexpression of MSH6 in colorectal carcinomas with MLH1/PMS2 deficiency.

PubMed

Shia, Jinru; Zhang, Liying; Shike, Moshe; Guo, Min; Stadler, Zsofia; Xiong, Xiaoling; Tang, Laura H; Vakiani, Efsevia; Katabi, Nora; Wang, Hangjun; Bacares, Ruben; Ruggeri, Jeanine; Boland, C Richard; Ladanyi, Marc; Klimstra, David S

2013-01-01

Immunohistochemical staining for DNA mismatch repair proteins may be affected by various biological and technical factors. Staining variations that could potentially lead to erroneous interpretations have been recognized. A recently recognized staining variation is the significant reduction of staining for MSH6 in some colorectal carcinomas. The frequency and specific characteristics of this aberrant MSH6 staining pattern, however, have not been well analyzed. In this study of 420 colorectal carcinoma samples obtained from patients fulfilling the Revised Bethesda Guidelines, we detected 9 tumors (2%) showing extremely limited staining for MSH6 with positive staining present in <5% of the tumor cells. Our analyses showed that these tumors belonged to two distinct categories: (1) MLH1 and/or PMS2 protein-deficient carcinomas (n=5, including 1 with a pathogenic mutation in PMS2); and (2) MLH1, PMS2 and MSH2 normal but with chemotherapy or chemoradiation therapy before surgery (n=4). To test our hypothesis that somatic mutation in the coding region microsatellite of the MSH6 gene might be a potential underlying mechanism for such limited MSH6 staining, we evaluated frameshift mutation in a (C)(8) tract in exon 5 of the MSH6 gene in seven tumors that had sufficient DNA for analysis, and detected mutation in four; all four tumors belonged to the MLH1/PMS2-deficient group. In conclusion, our data outline the main scenarios where significant reduction of MSH6 staining is more likely to occur in colorectal carcinoma, and suggest that somatic mutations of the coding region microsatellites of the MSH6 gene is an underlying mechanism for this staining phenomenon in MLH1/PMS2-deficient carcinomas.
Complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera, and comparative analyses with other grass genomes

PubMed Central

Saski, Christopher; Lee, Seung-Bum; Fjellheim, Siri; Guda, Chittibabu; Jansen, Robert K.; Luo, Hong; Tomkins, Jeffrey; Rognli, Odd Arne; Clarke, Jihong Liu

2009-01-01

Comparisons of complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera to six published grass chloroplast genomes reveal that gene content and order are similar but two microstructural changes have occurred. First, the expansion of the IR at the SSC/IRa boundary that duplicates a portion of the 5′ end of ndhH is restricted to the three genera of the subfamily Pooideae (Agrostis, Hordeum and Triticum). Second, a 6 bp deletion in ndhK is shared by Agrostis, Hordeum, Oryza and Triticum, and this event supports the sister relationship between the subfamilies Erhartoideae and Pooideae. Repeat analysis identified 19–37 direct and inverted repeats 30 bp or longer with a sequence identity of at least 90%. Seventeen of the 26 shared repeats are found in all the grass chloroplast genomes examined and are located in the same genes or intergenic spacer (IGS) regions. Examination of simple sequence repeats (SSRs) identified 16–21 potential polymorphic SSRs. Five IGS regions have 100% sequence identity among Zea mays, Saccharum officinarum and Sorghum bicolor, whereas no spacer regions were identical among Oryza sativa, Triticum aestivum, H. vulgare and A. stolonifera despite their close phylogenetic relationship. Alignment of EST sequences and DNA coding sequences identified six C–U conversions in both Sorghum bicolor and H. vulgare but only one in A. stolonifera. Phylogenetic trees based on DNA sequences of 61 protein-coding genes of 38 taxa using both maximum parsimony and likelihood methods provide moderate support for a sister relationship between the subfamilies Erhartoideae and Pooideae. PMID:17534593
Transfer RNA gene-targeted integration: an adaptation of retrotransposable elements to survive in the compact Dictyostelium discoideum genome.

PubMed

Winckler, T; Szafranski, K; Glöckner, G

2005-01-01

Almost every organism carries along a multitude of molecular parasites known as transposable elements (TEs). TEs influence their host genomes in many ways by expanding genome size and complexity, rearranging genomic DNA, mutagenizing host genes, and altering transcription levels of nearby genes. The eukaryotic microorganism Dictyostelium discoideum is attractive for the study of fundamental biological phenomena such as intercellular communication, formation of multicellularity, cell differentiation, and morphogenesis. D. discoideum has a highly compacted, haploid genome with less than 1 kb of genomic DNA separating coding regions. Nevertheless, the D. discoideum genome is loaded with 10% of TEs that managed to settle and survive in this inhospitable environment. In depth analysis of D. discoideum genome project data has provided intriguing insights into the evolutionary challenges that mobile elements face when they invade compact genomes. Two different mechanisms are used by D. discoideum TEs to avoid disruption of host genes upon retrotransposition. Several TEs have invented the specific targeting of tRNA gene-flanking regions as a means to avoid integration into coding regions. These elements have been dispersed on all chromosomes, closely following the distribution of tRNA genes. By contrast, TEs that lack bona fide integration specificities show a strong bias to nested integration, thus forming large TE clusters at certain chromosomal loci that are hardly resolved by bioinformatics approaches. We summarize our current view of D. discoideum TEs and present new data from the analysis of the complete sequences of D. discoideum chromosomes 1 and 2, which comprise more than one third of the total genome.
Germ-line and somatic EPHA2 coding variants in lens aging and cataract.

PubMed

Bennett, Thomas M; M'Hamdi, Oussama; Hejtmancik, J Fielding; Shiels, Alan

2017-01-01

Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive.
Germ-line and somatic EPHA2 coding variants in lens aging and cataract

PubMed Central

Bennett, Thomas M.; M’Hamdi, Oussama; Hejtmancik, J. Fielding

2017-01-01

Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive. PMID:29267365
Multiplex genotyping system for efficient inference of matrilineal genetic ancestry with continental resolution

PubMed Central

2011-01-01

Background In recent years, phylogeographic studies have produced detailed knowledge on the worldwide distribution of mitochondrial DNA (mtDNA) variants, linking specific clades of the mtDNA phylogeny with certain geographic areas. However, a multiplex genotyping system for the detection of the mtDNA haplogroups of major continental distribution that would be desirable for efficient DNA-based bio-geographic ancestry testing in various applications is still missing. Results Three multiplex genotyping assays, based on single-base primer extension technology, were developed targeting a total of 36 coding-region mtDNA variants that together differentiate 43 matrilineal haplo-/paragroups. These include the major diagnostic haplogroups for Africa, Western Eurasia, Eastern Eurasia and Native America. The assays show high sensitivity with respect to the amount of template DNA: successful amplification could still be obtained when using as little as 4 pg of genomic DNA and the technology is suitable for medium-throughput analyses. Conclusions We introduce an efficient and sensitive multiplex genotyping system for bio-geographic ancestry inference from mtDNA that provides resolution on the continental level. The method can be applied in forensics, to aid tracing unknown suspects, as well as in population studies, genealogy and personal ancestry testing. For more complete inferences of overall bio-geographic ancestry from DNA, the mtDNA system provided here can be combined with multiplex systems for suitable autosomal and, in the case of males, Y-chromosomal ancestry-sensitive DNA markers. PMID:21429198
Novel mutations in the CHST6 gene associated with macular corneal dystrophy in southern India.

PubMed

Warren, John F; Aldave, Anthony J; Srinivasan, M; Thonar, Eugene J; Kumar, Abha B; Cevallos, Vicky; Whitcher, John P; Margolis, Todd P

2003-11-01

To further characterize the role of the carbohydrate sulfotransferase (CHST6) gene in macular corneal dystrophy (MCD) through identification of causative mutations in a cohort of affected patients from southern India. Genomic DNA was extracted from buccal epithelium of 75 patients (51 families) with MCD, 33 unaffected relatives, and 48 healthy volunteers. The coding region of the CHST6 gene was evaluated by means of polymerase chain reaction amplification and direct sequencing. Subtyping of MCD into types I and II was performed by measuring serum levels of antigenic keratan sulfate. Seventy patients were classified as having type I MCD, and 5 patients as having type II MCD. Analysis of the CHST6 coding region in patients with type I MCD identified 11 homozygous missense mutations (Leu22Arg, His42Tyr, Arg50Cys, Arg50Leu, Ser53Leu, Arg97Pro, Cys102Tyr, Arg127Cys, Arg205Gln, His249Pro, and Glu274Lys), 2 compound heterozygous missense mutations (Arg93His and Ala206Thr), 5 homozygous deletion mutations (delCG707-708, delC890, delA1237, del1748-1770, and delORF), and 2 homozygous replacement mutations (ACCTAC 1273 GGT, and GCG 1304 AT). One patient with type II MCD was heterozygous for the C890 deletion mutation, whereas 4 possessed no CHST6 coding region mutations. A variety of previously unreported mutations in the coding region of the CHST6 gene are associated with type I MCD in a cohort of patients in southern India. An improved understanding of the genetic basis of MCD allows for earlier, more accurate diagnosis of affected individuals, and may provide the foundation for the development of novel disease treatments.

Track structure based modelling of light ion radiation effects on nuclear and mitochondrial DNA

NASA Astrophysics Data System (ADS)

Schmitt, Elke; Ottolenghi, Andrea; Dingfelder, Michael; Friedland, Werner; Kundrat, Pavel; Baiocco, Giorgio

2016-07-01

Space radiation risk assessment is of great importance for manned spaceflights in order to estimate risks and to develop counter-measures to reduce them. Biophysical simulations with PARTRAC can help greatly to improve the understanding of initial biological response to ionizing radiation. Results from modelling radiation quality dependent DNA damage and repair mechanisms up to chromosomal aberrations (e.g. dicentrics) can be used to predict radiation effects depending on the kind of mixed radiation field exposure. Especially dicentric yields can serve as a biomarker for an increased risk due to radiation and hence as an indicator for the effectiveness of the used shielding. PARTRAC [1] is a multi-scale biophysical research MC code for track structure based initial DNA damage and damage response modelling. It integrates physics, radiochemistry, detailed nuclear DNA structure and molecular biology of DNA repair by NHEJ-pathway to assess radiation effects on cellular level [2]. Ongoing experiments with quasi-homogeneously distributed compared to sub-micrometre focused bunches of protons, lithium and carbon ions allow a separation of effects due to DNA damage complexity on nanometre scale from damage clustering on (sub-) micrometre scale [3, 4]. These data provide an unprecedented benchmark for the DNA damage response model in PARTRAC and help understand the mechanisms leading to cell killing and chromosomal aberrations (e.g. dicentrics) induction. A large part of space radiation is due to a mixed ion field of high energy protons and few heavier ions that can be only partly absorbed by the shielding. Radiation damage induced by low-energy ions significantly contributes to the high relative biological efficiency (RBE) of ion beams around Bragg peak regions. For slow light ions the physical cross section data basis in PARTRAC has been extended to investigate radiation quality effects in the Bragg peak region [5]. The resulting range and LET values agree with ICRU data and SRIM calculations. Preliminary studies regarding the biological endpoints DSB (cluster) and chromosomal aberrations have been performed for selected light ions up to neon. Validation with experimental data as well as further calculations are underway and final results will be presented at the meeting. Mitochondrial alterations have been implicated in radiation-induced cardiovascular effects. To extend the applicability of PARTRAC biophysical tool towards effects on mitochondria, the nuclear DNA and chromatin as the primary target of radiation has been complemented by a model of mitochondrial DNA (mtDNA) to mimic a coronary cell with thousand mitochondria contained in the cytoplasm. Induced mtDNA damage (SSB, DSB) has been scored for 60Co photons and 5 MeV alpha-particle irradiation, assuming alternative radical scavenging capacities within the mitochondria. While direct radiation effects in mtDNA are identical to nuclear DNA, indirect effects in mtDNA are in general larger due to lower scavenging and the lack of DNA-protecting histones. These simulations complement the scarce experimental data on radiation-induced mtDNA damage and help elucidate the relative roles of initial mtDNA versus nuclear DNA damage and of pathways that amplify their respective effects. Ongoing and planned developments of PARTRAC include coupling with a radiation transport code and track-structure based calculations of cell killing for RBE studies on macroscopic scales within a mixed ion field. [1] Friedland, Dingfelder et al. (2011): "Track structures, DNA targets and radiation effects in the biophysical Monte Carlo simulation code PARTRAC", Mutat. Res. 711, 28-40 [2] Friedland et al. (2013): "Track structure based modelling of chromosome aberrations after photon and alpha-particle irradiation", Mutat. Res. 756, 213-223 [3] Schmid, Friedland et al. (2015): "Sub-micrometer 20 MeV protons or 45 MeV lithium spot irradiation enhances yields of dicentric chromosomes due to clustering of DNA double-strand breaks", Mutat. Res. 793, 30-40 [4] Friedland, Schmitt, Kundrat (2015): "Modelling Proton bunches focussed to submicrometre scales: Low-LET Radiation damage in high-LET-like spatial structure", Radiat. Prot. Dosim. 166, 34-37 [5] Schmitt, Friedland, Kundrat, Dingfelder, Ottolenghi (2015): "Cross section scaling for track structure simulations of low-energy ions in liquid water", Radiat. Prot. Dosim. 166, 15-18} Supported by the European Atomic Energy Community's Seventh Framework Programme (FP7/2007-2011) under grant agreement no 249689 "DoReMi" and the German Federal Ministry on Education and Research (KVSF-Projekt "LET-Verbund").
Ubiquitous and gene-specific regulatory 5' sequences in a sea urchin histone DNA clone coding for histone protein variants.

PubMed Central

Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L

1980-01-01

The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547
MtDNA profile of West Africa Guineans: towards a better understanding of the Senegambia region.

PubMed

Rosa, Alexandra; Brehm, António; Kivisild, Toomas; Metspalu, Ene; Villems, Richard

2004-07-01

The matrilineal genetic composition of 372 samples from the Republic of Guiné-Bissau (West African coast) was studied using RFLPs and partial sequencing of the mtDNA control and coding region. The majority of the mtDNA lineages of Guineans (94%) belong to West African specific sub-clusters of L0-L3 haplogroups. A new L3 sub-cluster (L3h) that is found in both eastern and western Africa is present at moderately low frequencies in Guinean populations. A non-random distribution of haplogroups U5 in the Fula group, the U6 among the "Brame" linguistic family and M1 in the Balanta-Djola group, suggests a correlation between the genetic and linguistic affiliation of Guinean populations. The presence of M1 in Balanta populations supports the earlier suggestion of their Sudanese origin. Haplogroups U5 and U6, on the other hand, were found to be restricted to populations that are thought to represent the descendants of a southern expansion of Berbers. Particular haplotypes, found almost exclusively in East-African populations, were found in some ethnic groups with an oral tradition claiming Sudanese origin.
Hiding message into DNA sequence through DNA coding and chaotic maps.

PubMed

Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

2014-09-01

The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
The complete mitochondrial genome of the bagarius yarrelli from honghe river

NASA Astrophysics Data System (ADS)

Du, M.; Zhou, C. J.; Niu, B. Z.; Liu, Y. H.; Li, N.; Ai, J. L.; Xu, G. L.

2016-08-01

The total length of mitochondrial DNA sequence of the Bagarius yarrelli from the Honghe river of China is determined in this paper. The total length of the circular molecule is 16524 base pair which denoted a similar gene order to that of the other bony fishes, which include a non-coding control region, a replicated origin, two ribosome RNA (rRNA) genes, 22 transfer RNA (tRNA) genes as well as 13 protein-coding genes. Its whole base constitution is 31.4% for A, 26.9% for C, 15.7% for G and 26.0% for T, with an A+T bias of 57.4%. Those mitochondrial data would contribute to further study molecular evolution and population genetics of this species.
Highly conserved elements discovered in vertebrates are present in non-syntenic loci of tunicates, act as enhancers and can be transcribed during development

PubMed Central

Sanges, Remo; Hadzhiev, Yavor; Gueroult-Bellone, Marion; Roure, Agnes; Ferg, Marco; Meola, Nicola; Amore, Gabriele; Basu, Swaraj; Brown, Euan R.; De Simone, Marco; Petrera, Francesca; Licastro, Danilo; Strähle, Uwe; Banfi, Sandro; Lemaire, Patrick; Birney, Ewan; Müller, Ferenc; Stupka, Elia

2013-01-01

Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cis-regulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as ‘Olfactores conserved non-coding elements’. PMID:23393190
De novo amplification within a silent human cholinesterase gene in a family subjected to prolonged exposure to organophosphorus insecticides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prody, C.A.; Dreyfus, P.; Soreq, H.

1989-01-01

A 100-fold DNA amplification in the CHE gene, coding for serum butyrylcholinesterase (BtChoEase), was found in a farmer expressing silent CHE phenotype. Individuals homozygous for this gene display a defective serum BtChoEase and are particularly vulnerable to poisoning by agricultural organophosphorus insecticides, to which all members of this family had long been exposed. DNA blot hybridization with regional BtChoEase cDNA probes suggested that the amplification was most intense in regions encoding central sequences within BtChoEase cDNA, whereas distal sequences were amplified to a much lower extent. This is in agreement with the onion skin model, based on amplification of genesmore » in cultured cells and primary tumors. The amplification was absent in the grandparents but present at the same extent in one of their sons and in a grandson, with similar DNA blot hybridization patterns. In situ hybridization experiments localized the amplified sequences to the long arm of chromosome 3, close to the site where the authors previously mapped the CHE gene. Altogether, these observations suggest that the initial amplification event occurred early in embryogenesis, spermatogenesis, or oogenesis, where the CHE gene is intensely active and where cholinergic functioning was indicated to be physiologically necessary. These findings demonstrate a de novo amplification in apparently healthy individuals within an autosomal gene producing a target protein to an inhibitor.« less
Mitochondrial DNA triplication and punctual mutations in patients with mitochondrial neuromuscular disorders

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mkaouar-Rebai, Emna, E-mail: emna.mkaouar@gmail.com; Felhi, Rahma; Tabebi, Mouna

Mitochondrial diseases are a heterogeneous group of disorders caused by the impairment of the mitochondrial oxidative phosphorylation system which have been associated with various mutations of the mitochondrial DNA (mtDNA) and nuclear gene mutations. The clinical phenotypes are very diverse and the spectrum is still expanding. As brain and muscle are highly dependent on OXPHOS, consequently, neurological disorders and myopathy are common features of mtDNA mutations. Mutations in mtDNA can be classified into three categories: large-scale rearrangements, point mutations in tRNA or rRNA genes and point mutations in protein coding genes. In the present report, we screened mitochondrial genes ofmore » complex I, III, IV and V in 2 patients with mitochondrial neuromuscular disorders. The results showed the presence the pathogenic heteroplasmic m.9157G>A variation (A211T) in the MT-ATP6 gene in the first patient. We also reported the first case of triplication of 9 bp in the mitochondrial NC7 region in Africa and Tunisia, in association with the novel m.14924T>C in the MT-CYB gene in the second patient with mitochondrial neuromuscular disorder. - Highlights: • We reported 2 patients with mitochondrial neuromuscular disorders. • The heteroplasmic MT-ATP6 9157G>A variation was reported. • A triplication of 9 bp in the mitochondrial NC7 region was detected. • The m.14924T>C transition (S60P) in the MT-CYB gene was found.« less
Physical interactions between bacteriophage and Escherichia coli proteins required for initiation of lambda DNA replication.

PubMed

Liberek, K; Osipiuk, J; Zylicz, M; Ang, D; Skorko, J; Georgopoulos, C

1990-02-25

The process of initiation of lambda DNA replication requires the assembly of the proper nucleoprotein complex at the origin of replication, ori lambda. The complex is composed of both phage and host-coded proteins. The lambda O initiator protein binds specifically to ori lambda. The lambda P initiator protein binds to both lambda O and the host-coded dnaB helicase, giving rise to an ori lambda DNA.lambda O.lambda P.dnaB structure. The dnaK and dnaJ heat shock proteins have been shown capable of dissociating this complex. The thus freed dnaB helicase unwinds the duplex DNA template at the replication fork. In this report, through cross-linking, size chromatography, and protein affinity chromatography, we document some of the protein-protein interactions occurring at ori lambda. Our results show that the dnaK protein specifically interacts with both lambda O and lambda P, and that the dnaJ protein specifically interacts with the dnaB helicase.
Complete mitochondrial genome of the Asian pencil halfbeak Hyporhamphus intermedius (Beloniformes, Hemirhamphidae).

PubMed

Song, Chao; Hu, Gengdong; Qiu, Liping; Fan, Limin; Meng, Shunlong; Chen, Jiazhang

2016-11-01

The complete mitochondrial genome of Hyporhamphus intermedius was determined to be 16,720 bp in length with (A + T) content of 56.3%, and it consists of 13 protein-coding genes, 22 tRNAs, two ribosomal RNAs, and a control region. The gene composition and the structural arrangement of the H. intermedius complete mtDNA were identical to most of the other vertebrates. Interestingly, two tandem repeat units were identified across tRNA-Pro and control region (2*41 bp), while in most of the fishes the tandem repeat units are located in the control region. The molecular data we presented here could play a useful role to study the evolutionary relationships and population genetics of Hemirhamphidae fish.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions.

PubMed

Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai

2017-01-01

Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions

PubMed Central

Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize

2017-01-01

Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399
Position specific variation in the rate of evolution in transcription factor binding sites

PubMed Central

Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B

2003-01-01

Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
Privacy rules for DNA databanks. Protecting coded 'future diaries'.

PubMed

Annas, G J

1993-11-17

In privacy terms, genetic information is like medical information. But the information contained in the DNA molecule itself is more sensitive because it contains an individual's probabilistic "future diary," is written in a code that has only partially been broken, and contains information about an individual's parents, siblings, and children. Current rules for protecting the privacy of medical information cannot protect either genetic information or identifiable DNA samples stored in DNA databanks. A review of the legal and public policy rationales for protecting genetic privacy suggests that specific enforceable privacy rules for DNA databanks are needed. Four preliminary rules are proposed to govern the creation of DNA databanks, the collection of DNA samples for storage, limits on the use of information derived from the samples, and continuing obligations to those whose DNA samples are in the databanks.
DNA transposons have colonized the genome of the giant virus Pandoravirus salinus.

PubMed

Sun, Cheng; Feschotte, Cédric; Wu, Zhiqiang; Mueller, Rachel Lockridge

2015-06-12

Transposable elements are mobile DNA sequences that are widely distributed in prokaryotic and eukaryotic genomes, where they represent a major force in genome evolution. However, transposable elements have rarely been documented in viruses, and their contribution to viral genome evolution remains largely unexplored. Pandoraviruses are recently described DNA viruses with genome sizes that exceed those of some prokaryotes, rivaling parasitic eukaryotes. These large genomes appear to include substantial noncoding intergenic spaces, which provide potential locations for transposable element insertions. However, no mobile genetic elements have yet been reported in pandoravirus genomes. Here, we report a family of miniature inverted-repeat transposable elements (MITEs) in the Pandoravirus salinus genome, representing the first description of a virus populated with a canonical transposable element family that proliferated by transposition within the viral genome. The MITE family, which we name Submariner, includes 30 copies with all the hallmarks of MITEs: short length, terminal inverted repeats, TA target site duplication, and no coding capacity. Submariner elements show signs of transposition and are undetectable in the genome of Pandoravirus dulcis, the closest known relative Pandoravirus salinus. We identified a DNA transposon related to Submariner in the genome of Acanthamoeba castellanii, a species thought to host pandoraviruses, which contains remnants of coding sequence for a Tc1/mariner transposase. These observations suggest that the Submariner MITEs of P. salinus belong to the widespread Tc1/mariner superfamily and may have been mobilized by an amoebozoan host. Ten of the 30 MITEs in the P. salinus genome are located within coding regions of predicted genes, while others are close to genes, suggesting that these transposons may have contributed to viral genetic novelty. Our discovery highlights the remarkable ability of DNA transposons to colonize and shape genomes from all domains of life, as well as giant viruses. Our findings continue to blur the division between viral and cellular genomes, adhering to the emerging view that the content, dynamics, and evolution of the genomes of giant viruses do not substantially differ from those of cellular organisms.
Rate heterogeneity in six protein-coding genes from the holoparasite Balanophora (Balanophoraceae) and other taxa of Santalales

PubMed Central

Su, Huei-Jiun; Hu, Jer-Ming

2012-01-01

Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Novel DNA variations to characterize low molecular weight glutenin Glu-D3 genes and develop STS markers in common wheat.

PubMed

Zhao, X L; Xia, X C; He, Z H; Lei, Z S; Appels, R; Yang, Y; Sun, Q X; Ma, W

2007-02-01

Low-molecular-weight glutenin subunits (LMW-GS) play an important role in bread and noodle processing quality by influencing the viscoelasticity and extensibility of dough. The objectives of this study were to characterize Glu-D3 subunit coding genes and to develop molecular markers for identifying Glu-D3 gene haplotypes. Gene specific primer sets were designed to amplify eight wheat cultivars containing Glu-D3a, b, c, d and e alleles, defined traditionally by protein electrophoretic mobility. Three novel Glu-D3 DNA sequences, designated as GluD3-4, GluD3-5 and GluD3-6, were amplified from the eight wheat cultivars. GluD3-4 showed three allelic variants or haplotypes at the DNA level in the eight cultivars, which were designated as GluD3-41, GluD3-42 and GluD3-43. Compared with GluD3-42, a single nucleotide polymorphism (SNP) was detected for GluD3-43 in the coding region, resulting in a pseudo-gene with a nonsense mutation at the 119th position of deduced peptide, and a 3-bp insertion was found in the coding region of GluD3-41, leading to a glutamine insertion at the 249th position of its deduced protein. The coding regions for GluD3-5 and GluD3-6 showed no allelic variation in the eight cultivars tested, indicating that they were relatively conservative in common wheat. Based on the 12 allelic variants of three Glu-D3 genes identified in this study and three detected previously, seven STS markers were established to amplify the corresponding gene sequences in wheat cultivars containing five Glu-D3 alleles (a, b, c, d and e). The seven primer sets M2F12/M2R12, M2F2/M2R2, M2F3/M2R3, M3F1/M3R1, M3F2/M3R2, M4F1/M4R1 and M4F3/M4R3 were specific to the allelic variants GluD3-21/22, GluD3-22, GluD3-23, GluD3-31, GluD3-32, GluD3-41 and GluD3-43, respectively, which were validated by amplifying 20 Chinese wheat cultivars containing alleles a, b, c and f based on protein electrophoretic mobility. These markers will be useful to identify the Glu-D3 gene haplotypes in wheat breeding programs.
DNA sequences of three beta-1,4-endoglucanase genes from Thermomonospora fusca.

PubMed Central

Lao, G; Ghangas, G S; Jung, E D; Wilson, D B

1991-01-01

The DNA sequences of the Thermomonospora fusca genes encoding cellulases E2 and E5 and the N-terminal end of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of the initiation codon. There were no significant homologies between the coding regions of the three genes. The E2 gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found with other cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA from Cellulomonas fimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichoderma reesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. There were strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and these regions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains. PMID:1904434
Interatomic Coulombic Decay Effects in Theoretical DNA Recombination Systems Involving Protein Interaction Sites

NASA Astrophysics Data System (ADS)

Vargas, E. L.; Rivas, D. A.; Duot, A. C.; Hovey, R. T.; Andrianarijaona, V. M.

2015-03-01

DNA replication is the basis for all biological reproduction. A strand of DNA will ``unzip'' and bind with a complimentary strand, creating two identical strands. In this study, we are considering how this process is affected by Interatomic Coulombic Decay (ICD), specifically how ICD affects the individual coding proteins' ability to hold together. ICD mainly deals with how the electron returns to its original state after excitation and how this affects its immediate atomic environment, sometimes affecting the connectivity between interaction sites on proteins involved in the DNA coding process. Biological heredity is fundamentally controlled by DNA and its replication therefore it affects every living thing. The small nature of the proteins (within the range of nanometers) makes it a good candidate for research of this scale. Understanding how ICD affects DNA molecules can give us invaluable insight into the human genetic code and the processes behind cell mutations that can lead to cancer. Authors wish to give special thanks to Pacific Union College Student Senate in Angwin, California, for their financial support.
Transcription and DNA Damage: Holding Hands or Crossing Swords?

PubMed

D'Alessandro, Giuseppina; d'Adda di Fagagna, Fabrizio

2017-10-27

Transcription has classically been considered a potential threat to genome integrity. Collision between transcription and DNA replication machinery, and retention of DNA:RNA hybrids, may result in genome instability. On the other hand, it has been proposed that active genes repair faster and preferentially via homologous recombination. Moreover, while canonical transcription is inhibited in the proximity of DNA double-strand breaks, a growing body of evidence supports active non-canonical transcription at DNA damage sites. Small non-coding RNAs accumulate at DNA double-strand break sites in mammals and other organisms, and are involved in DNA damage signaling and repair. Furthermore, RNA binding proteins are recruited to DNA damage sites and participate in the DNA damage response. Here, we discuss the impact of transcription on genome stability, the role of RNA binding proteins at DNA damage sites, and the function of small non-coding RNAs generated upon damage in the signaling and repair of DNA lesions. Copyright © 2016 Elsevier Ltd. All rights reserved.

Differentiation of BHV-1 isolates from vaccine virus by high-resolution melting analysis.

PubMed

Ostertag-Hill, Claire; Fang, Liang; Izume, Satoko; Lee, Megan; Reed, Aimee; Jin, Ling

2015-02-16

An efficacious bovine herpesvirus type-1 (BHV-1) vaccine has been used for many years. However, in the past few years, abortion and respiratory diseases have occurred after administration of the modified live vaccine. To investigate whether BHV-1 isolates from disease outbreaks are identical to those of the vaccines used, selected regions of the BHV-1 genome were investigated by high-resolution melting (HRM) analysis and PCR-DNA sequencing. When a target region within the thymidine kinase (TK) gene was examined by HRM analysis, 6 out of the 11 isolates from abortion cases and 22 out of the 25 isolates from bovine respiratory disease (BRD) cases had different melting curves compared to the vaccine virus. Surprisingly, when a conserved region within the US6 gene that encodes glycoprotein D (gD) was examined by HRM analysis, 5 out of the 11 abortion isolates and 18 out of the 23 BRD isolates had different melting curves from the vaccine virus. To determine whether SNPs within the coding regions of glycoprotein E (gE) and TK genes can be used to differentiate the isolates from the vaccine virus, PCR-DNA sequencing was used to examine these SNPs in all the isolates. This revealed that only 1 out of 11 of the abortion isolates and 4 out of 24 of the BRD isolates are different in the target region of gE from the vaccine virus, while 5 out of 11 abortion isolates and 4 out of 22 BRD isolates are different in the target region of TK from the vaccine virus. No DNA sequence differences were observed in glycoprotein G (gG) region between disease and vaccine isolates. Our study demonstrated that many disease isolates had genetic differences from the vaccine virus in regions examined by HRM and PCR-DNA sequencing analysis. In addition, many isolates contained more than one type of mutation and were composed of mixed variants. Our study suggests that a mixture of variants were present in isolates collected post-vaccination. HRM is a rapid diagnostic method that can be used for rapid differentiation of clinical isolates from vaccine strains. Copyright © 2014 Elsevier B.V. All rights reserved.
Landscape of somatic mutations in 560 breast cancer whole-genome sequences

DOE PAGES

Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; ...

2016-05-02

Here, we analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, anothermore » with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.« less
Landscape of somatic mutations in 560 breast cancer whole-genome sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nik-Zainal, Serena; Davies, Helen; Staaf, Johan

Here, we analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, anothermore » with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.« less
Regulatory variation: an emerging vantage point for cancer biology.

PubMed

Li, Luolan; Lorzadeh, Alireza; Hirst, Martin

2014-01-01

Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
Recombined sequences between the non-coding control regions of JC and BK viruses found in the urine of a renal transplantation patient.

PubMed

Liaw, Yu-Ching; Chen, Cheng-Hsu; Shu, Kuo-Hsiung; Fang, Chiung-Yao; Ou, Wei-Chih; Chen, Pei-Lain; Shen, Cheng-Huang; Lin, Mien-Chun; Chang, Deching; Wang, Meilin

2012-12-01

Kidney cells are the common host for JC virus (JCV) and BK virus (BKV). Reactivation of JCV and/or BKV in patients after organ transplantation, such as renal transplantation, may cause hemorrhagic cystitis and polyomavirus-associated nephropathy. Furthermore, JCV and BKV may be shed in the urine after reactivation in the kidney. Rearranged as well as archetypal non-coding control regions (NCCRs) of JCV and BKV have been frequently identified in human samples. In this study, three JC/BK recombined NCCR sequences were identified in the urine of a patient who had undergone renal transplantation. They were designated as JC-BK hybrids 1, 2, and 3. The three JC/BK recombinant NCCRs contain up-stream JCV as well as down-stream BKV sequences. Deletions of both JCV and BKV sequences were found in these recombined NCCRs. Recombination of DNA sequences between JCV and BKV may occur during co-infection due to the relatively high homology of the two viral genomes.
Novel human CRYGD rare variant in a Brazilian family with congenital cataract

PubMed Central

Giordano, Gabriel Gorgone; Tavares, Anderson; da Silva, Márcio José; de Vasconcellos, José Paulo Cabral; Arieta, Carlos Eduardo Leite; de Melo, Mônica Barbosa

2011-01-01

Purpose To describe a novel polymorphism in the γD-crystallin (CRYGD) gene in a Brazilian family with congenital cataract. Methods A Brazilian four-generation family was analyzed. The proband had bilateral lamellar cataract and the phenotypes were classified by slit lamp examination. Genomic DNA was extracted from peripheral blood and coding regions and intron/exon boundaries of the αA-crystallin (CRYAA), γC-crystallin (CRYGC), and CRYGD genes were amplified by polymerase chain reaction and directly sequenced. Results Sequencing of the coding regions of CRYGD showed the presence of a heterozygous A→G transversion at c.401 position, which results in the substitution of a tyrosine to a cysteine (Y134C). The polymorphism was identified in three individuals, two affected and one unaffected. Conclusions A novel rare variant in CRYGD (Y134C) was detected in a Brazilian family with congenital cataract. Because there is no segregation between the substitution and the phenotypes in this family, other genetic alterations are likely to be present. PMID:21866214
The structure of the human interferon alpha/beta receptor gene.

PubMed

Lutfalla, G; Gardiner, K; Proudhon, D; Vielh, E; Uzé, G

1992-02-05

Using the cDNA coding for the human interferon alpha/beta receptor (IFNAR), the IFNAR gene has been physically mapped relative to the other loci of the chromosome 21q22.1 region. 32,906 base pairs covering the IFNAR gene have been cloned and sequenced. Primer extension and solution hybridization-ribonuclease protection have been used to determine that the transcription of the gene is initiated in a broad region of 20 base pairs. Some aspects of the polymorphism of the gene, including noncoding sequences, have been analyzed; some are allelic differences in the coding sequence that induce amino acid variations in the resulting protein. The exon structure of the IFNAR gene and of that of the available genes for the receptors of the cytokine/growth hormone/prolactin/interferon receptor family have been compared with the predictions for the secondary structure of those receptors. From this analysis, we postulate a common origin and propose an hypothesis for the divergence from the immunoglobulin superfamily.
Landscape of somatic mutations in 560 breast cancer whole genome sequences

PubMed Central

Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; Ramakrishna, Manasa; Glodzik, Dominik; Zou, Xueqing; Martincorena, Inigo; Alexandrov, Ludmil B.; Martin, Sancha; Wedge, David C.; Van Loo, Peter; Ju, Young Seok; Smid, Marcel; Brinkman, Arie B; Morganella, Sandro; Aure, Miriam R.; Lingjærde, Ole Christian; Langerød, Anita; Ringnér, Markus; Ahn, Sung-Min; Boyault, Sandrine; Brock, Jane E.; Broeks, Annegien; Butler, Adam; Desmedt, Christine; Dirix, Luc; Dronov, Serge; Fatima, Aquila; Foekens, John A.; Gerstung, Moritz; Hooijer, Gerrit KJ; Jang, Se Jin; Jones, David R.; Kim, Hyung-Yong; King, Tari A.; Krishnamurthy, Savitri; Lee, Hee Jin; Lee, Jeong-Yeon; Li, Yilong; McLaren, Stuart; Menzies, Andrew; Mustonen, Ville; O’Meara, Sarah; Pauporté, Iris; Pivot, Xavier; Purdie, Colin A.; Raine, Keiran; Ramakrishnan, Kamna; Rodríguez-González, F. Germán; Romieu, Gilles; Sieuwerts, Anieta M.; Simpson, Peter T; Shepherd, Rebecca; Stebbings, Lucy; Stefansson, Olafur A; Teague, Jon; Tommasi, Stefania; Treilleux, Isabelle; Van den Eynden, Gert G.; Vermeulen, Peter; Vincent-Salomon, Anne; Yates, Lucy; Caldas, Carlos; van’t Veer, Laura; Tutt, Andrew; Knappskog, Stian; Tan, Benita Kiat Tee; Jonkers, Jos; Borg, Åke; Ueno, Naoto T; Sotiriou, Christos; Viari, Alain; Futreal, P. Andrew; Campbell, Peter J; Span, Paul N.; Van Laere, Steven; Lakhani, Sunil R; Eyfjord, Jorunn E.; Thompson, Alastair M.; Birney, Ewan; Stunnenberg, Hendrik G; van de Vijver, Marc J; Martens, John W.M.; Børresen-Dale, Anne-Lise; Richardson, Andrea L.; Kong, Gu; Thomas, Gilles; Stratton, Michael R.

2016-01-01

We analysed whole genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. 93 protein-coding cancer genes carried likely driver mutations. Some non-coding regions exhibited high mutation frequencies but most have distinctive structural features probably causing elevated mutation rates and do not harbour driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed 12 base substitution and six rearrangement signatures. Three rearrangement signatures, characterised by tandem duplications or deletions, appear associated with defective homologous recombination based DNA repair: one with deficient BRCA1 function; another with deficient BRCA1 or BRCA2 function; the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operative, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer. PMID:27135926
Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

PubMed Central

Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

2006-01-01

Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030
Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II

PubMed Central

Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter

2017-01-01

The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Discovery of and Use of Fragments of DOC1 as Antiangiogenic and Antitumor Therapy | NCI Technology Transfer Center | TTC

Cancer.gov

This invention describes small cDNA fragments of the coding region for wild type filamin A interacting protein 1-like (FILIP1L) and variant 2 of FILIP1L genes that encode proteins inhibit cell migration and motility, induce cell apoptosis and inhibit cell proliferation. The significance of this invention is that it could provide for a series of new anti-cancer therapeutics and for the diagnostic means to follow their expression levels.
Histone Code Modulation by Oncogenic PWWP-Domain Protein in Breast Cancers

DTIC Science & Technology

2014-08-01

discs, the Drosophila melanogaster homo- logue of human retinoblastoma binding protein 2. Genetics 2000; 156: 645-663. [10] Zeng J, Ge Z, Wang L...in breast cancer patients (7-11). Earlier, we used genomic analysis of copy number and gene expression to perform a detailed analysis of the 8p11-12...from the 8p11-12 region (14). Very recently, we searched the Cancer Genome Atlas database that contains 744 breast invasive carcinomas. We found DNA or
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.

PubMed

O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F

2011-07-01

To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
Chloroplast Genome Differences between Asian and American Equisetum arvense (Equisetaceae) and the Origin of the Hypervariable trnY-trnE Intergenic Spacer

PubMed Central

Kim, Hyoung Tae; Kim, Ki-Joong

2014-01-01

Comparative analyses of complete chloroplast (cp) DNA sequences within a species may provide clues to understand the population dynamics and colonization histories of plant species. Equisetum arvense (Equisetaceae) is a widely distributed fern species in northeastern Asia, Europe, and North America. The complete cp DNA sequences from Asian and American E. arvense individuals were compared in this study. The Asian E. arvense cp genome was 583 bp shorter than that of the American E. arvense. In total, 159 indels were observed between two individuals, most of which were concentrated on the hypervariable trnY-trnE intergenic spacer (IGS) in the large single-copy (LSC) region of the cp genome. This IGS region held a series of 19 bp repeating units. The numbers of the 19 bp repeat unit were responsible for 78% of the total length difference between the two cp genomes. Furthermore, only other closely related species of Equisetum also show the hypervariable nature of the trnY-trnE IGS. By contrast, only a single indel was observed in the gene coding regions: the ycf1 gene showed 24 bp differences between the two continental individuals due to a single tandem-repeat indel. A total of 165 single-nucleotide polymorphisms (SNPs) were recorded between the two cp genomes. Of these, 52 SNPs (31.5%) were distributed in coding regions, 13 SNPs (7.9%) were in introns, and 100 SNPs (60.6%) were in intergenic spacers (IGS). The overall difference between the Asian and American E. arvense cp genomes was 0.12%. Despite the relatively high genetic diversity between Asian and American E. arvense, the two populations are recognized as a single species based on their high morphological similarity. This indicated that the two regional populations have been in morphological stasis. PMID:25157804
Decoding the non-coding genome: elucidating genetic risk outside the coding genome.

PubMed

Barr, C L; Misener, V L

2016-01-01

Current evidence emerging from genome-wide association studies indicates that the genetic underpinnings of complex traits are likely attributable to genetic variation that changes gene expression, rather than (or in combination with) variation that changes protein-coding sequences. This is particularly compelling with respect to psychiatric disorders, as genetic changes in regulatory regions may result in differential transcriptional responses to developmental cues and environmental/psychosocial stressors. Until recently, however, the link between transcriptional regulation and psychiatric genetic risk has been understudied. Multiple obstacles have contributed to the paucity of research in this area, including challenges in identifying the positions of remote (distal from the promoter) regulatory elements (e.g. enhancers) and their target genes and the underrepresentation of neural cell types and brain tissues in epigenome projects - the availability of high-quality brain tissues for epigenetic and transcriptome profiling, particularly for the adolescent and developing brain, has been limited. Further challenges have arisen in the prediction and testing of the functional impact of DNA variation with respect to multiple aspects of transcriptional control, including regulatory-element interaction (e.g. between enhancers and promoters), transcription factor binding and DNA methylation. Further, the brain has uncommon DNA-methylation marks with unique genomic distributions not found in other tissues - current evidence suggests the involvement of non-CG methylation and 5-hydroxymethylation in neurodevelopmental processes but much remains unknown. We review here knowledge gaps as well as both technological and resource obstacles that will need to be overcome in order to elucidate the involvement of brain-relevant gene-regulatory variants in genetic risk for psychiatric disorders. © 2015 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.
Functional analysis of the ComK protein of Bacillus coagulans.

PubMed

Kovács, Ákos T; Eckhardt, Tom H; van Hartskamp, Mariska; van Kranenburg, Richard; Kuipers, Oscar P

2013-01-01

The genes for DNA uptake and recombination in Bacilli are commonly regulated by the transcriptional factor ComK. We have identified a ComK homologue in Bacillus coagulans, an industrial relevant organism that is recalcitrant for transformation. Introduction of B. coagulans comK gene under its own promoter region into Bacillus subtilis comK strain results in low transcriptional induction of the late competence gene comGA, but lacking bistable expression. The promoter regions of B. coagulans comK and the comGA genes are recognized in B. subtilis and expression from these promoters is activated by B. subtilis ComK. Purified ComK protein of B. coagulans showed DNA-binding ability in gel retardation assays with B. subtilis- and B. coagulans-derived probes. These experiments suggest that the function of B. coagulans ComK is similar to that of ComK of B. subtilis. When its own comK is overexpressed in B. coagulans the comGA gene expression increases 40-fold, while the expression of another late competence gene, comC is not elevated and no reproducible DNA-uptake could be observed under these conditions. Our results demonstrate that B. coagulans ComK can recognize several B.subtilis comK-responsive elements, and vice versa, but indicate that the activation of the transcription of complete sets of genes coding for a putative DNA uptake apparatus in B. coagulans might differ from that of B. subtilis.
Crystal structure of the DNA-binding domain of the LysR-type transcriptional regulator CbnR in complex with a DNA fragment of the recognition-binding site in the promoter region.

PubMed

Koentjoro, Maharani Pertiwi; Adachi, Naruhiko; Senda, Miki; Ogawa, Naoto; Senda, Toshiya

2018-03-01

LysR-type transcriptional regulators (LTTRs) are among the most abundant transcriptional regulators in bacteria. CbnR is an LTTR derived from Cupriavidus necator (formerly Alcaligenes eutrophus or Ralstonia eutropha) NH9 and is involved in transcriptional activation of the cbnABCD genes encoding chlorocatechol degradative enzymes. CbnR interacts with a cbnA promoter region of approximately 60 bp in length that contains the recognition-binding site (RBS) and activation-binding site (ABS). Upon inducer binding, CbnR seems to undergo conformational changes, leading to the activation of the transcription. Since the interaction of an LTTR with RBS is considered to be the first step of the transcriptional activation, the CbnR-RBS interaction is responsible for the selectivity of the promoter to be activated. To understand the sequence selectivity of CbnR, we determined the crystal structure of the DNA-binding domain of CbnR in complex with RBS of the cbnA promoter at 2.55 Å resolution. The crystal structure revealed details of the interactions between the DNA-binding domain and the promoter DNA. A comparison with the previously reported crystal structure of the DNA-binding domain of BenM in complex with its cognate RBS showed several differences in the DNA interactions, despite the structural similarity between CbnR and BenM. These differences explain the observed promoter sequence selectivity between CbnR and BenM. Particularly, the difference between Thr33 in CbnR and Ser33 in BenM appears to affect the conformations of neighboring residues, leading to the selective interactions with DNA. Atomic coordinates and structure factors for the DNA-binding domain of Cupriavidus necatorNH9 CbnR in complex with RBS are available in the Protein Data Bank under the accession code 5XXP. © 2018 Federation of European Biochemical Societies.
Evidence of authentic DNA from Danish Viking Age skeletons untouched by humans for 1,000 years.

PubMed

Melchior, Linea; Kivisild, Toomas; Lynnerup, Niels; Dissing, Jørgen

2008-05-28

Given the relative abundance of modern human DNA and the inherent impossibility for incontestable proof of authenticity, results obtained on ancient human DNA have often been questioned. The widely accepted rules regarding ancient DNA work mainly affect laboratory procedures, however, pre-laboratory contamination occurring during excavation and archaeological-/anthropological handling of human remains as well as rapid degradation of authentic DNA after excavation are major obstacles. We avoided some of these obstacles by analyzing DNA from ten Viking Age subjects that at the time of sampling were untouched by humans for 1,000 years. We removed teeth from the subjects prior to handling by archaeologists and anthropologists using protective equipment. An additional tooth was removed after standard archaeological and anthropological handling. All pre-PCR work was carried out in a "clean- laboratory" dedicated solely to ancient DNA work. Mitochondrial DNA was extracted and overlapping fragments spanning the HVR-1 region as well as diagnostic sites in the coding region were PCR amplified, cloned and sequenced. Consistent results were obtained with the "unhandled" teeth and there was no indication of contamination, while the latter was the case with half of the "handled" teeth. The results allowed the unequivocal assignment of a specific haplotype to each of the subjects, all haplotypes being compatible in their character states with a phylogenetic tree drawn from present day European populations. Several of the haplotypes are either infrequent or have not been observed in modern Scandinavians. The observation of haplogroup I in the present study (<2% in modern Scandinavians) supports our previous findings of a pronounced frequency of this haplogroup in Viking and Iron Age Danes. The present work provides further evidence that retrieval of ancient human DNA is a possible task provided adequate precautions are taken and well-considered sampling is applied.
Evidence of Authentic DNA from Danish Viking Age Skeletons Untouched by Humans for 1,000 Years

PubMed Central

Melchior, Linea; Kivisild, Toomas; Lynnerup, Niels; Dissing, Jørgen

2008-01-01

Background Given the relative abundance of modern human DNA and the inherent impossibility for incontestable proof of authenticity, results obtained on ancient human DNA have often been questioned. The widely accepted rules regarding ancient DNA work mainly affect laboratory procedures, however, pre-laboratory contamination occurring during excavation and archaeological-/anthropological handling of human remains as well as rapid degradation of authentic DNA after excavation are major obstacles. Methodology/Principal Findings We avoided some of these obstacles by analyzing DNA from ten Viking Age subjects that at the time of sampling were untouched by humans for 1,000 years. We removed teeth from the subjects prior to handling by archaeologists and anthropologists using protective equipment. An additional tooth was removed after standard archaeological and anthropological handling. All pre-PCR work was carried out in a “clean- laboratory” dedicated solely to ancient DNA work. Mitochondrial DNA was extracted and overlapping fragments spanning the HVR-1 region as well as diagnostic sites in the coding region were PCR amplified, cloned and sequenced. Consistent results were obtained with the “unhandled” teeth and there was no indication of contamination, while the latter was the case with half of the “handled” teeth. The results allowed the unequivocal assignment of a specific haplotype to each of the subjects, all haplotypes being compatible in their character states with a phylogenetic tree drawn from present day European populations. Several of the haplotypes are either infrequent or have not been observed in modern Scandinavians. The observation of haplogroup I in the present study (<2% in modern Scandinavians) supports our previous findings of a pronounced frequency of this haplogroup in Viking and Iron Age Danes. Conclusion The present work provides further evidence that retrieval of ancient human DNA is a possible task provided adequate precautions are taken and well-considered sampling is applied. PMID:18509537
Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

PubMed Central

Tramontano, A; Macchiato, M F

1986-01-01

An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761

Genome-wide uniformity of human ‘open’ pre-initiation complexes

PubMed Central

Lai, William K.M.; Pugh, B. Franklin

2017-01-01

Transcription of protein-coding and noncoding DNA occurs pervasively throughout the mammalian genome. Their sites of initiation are generally inferred from transcript 5′ ends and are thought to be either locally dispersed or focused. How these two modes of initiation relate is unclear. Here, we apply permanganate treatment and chromatin immunoprecipitation (PIP-seq) of initiation factors to identify the precise location of melted DNA separately associated with the preinitiation complex (PIC) and the adjacent paused complex (PC). This approach revealed the two known modes of transcription initiation. However, in contrast to prevailing views, they co-occurred within the same promoter region: initiation originating from a focused PIC, and broad nucleosome-linked initiation. PIP-seq allowed transcriptional orientation of Pol II to be determined, which may be useful near promoters where sufficient sense/anti-sense transcript mapping information is lacking. PIP-seq detected divergently oriented Pol II at both coding and noncoding promoters, as well as at enhancers. Their occupancy levels were not necessarily coupled in the two orientations. DNA sequence and shape analysis of initiation complex sites suggest that both sequence and shape contribute to specificity, but in a context-restricted manner. That is, initiation sites have the locally “best” initiator (INR) sequence and/or shape. These findings reveal a common core to pervasive Pol II initiation throughout the human genome. PMID:27927716
A cosmid and cDNA fine physical map of a human chromosome 13q14 region frequently lost in B-cell chronic lymphocytic leukemia and identification of a new putative tumor suppressor gene, Leu5.

PubMed

Kapanadze, B; Kashuba, V; Baranova, A; Rasool, O; van Everdink, W; Liu, Y; Syomov, A; Corcoran, M; Poltaraus, A; Brodyansky, V; Syomova, N; Kazakov, A; Ibbotson, R; van den Berg, A; Gizatullin, R; Fedorova, L; Sulimova, G; Zelenin, A; Deaven, L; Lehrach, H; Grander, D; Buys, C; Oscier, D; Zabarovsky, E R; Einhorn, S; Yankovsky, N

1998-04-17

B-cell chronic lymphocytic leukemia (B-CLL) is a human hematological neoplastic disease often associated with the loss of a chromosome 13 region between RB1 gene and locus D13S25. A new tumor suppressor gene (TSG) may be located in the region. A cosmid contig has been constructed between the loci D13S1168 (WI9598) and D13S25 (H2-42), which corresponds to the minimal region shared by B-CLL associated deletions. The contig includes more than 200 LANL and ICRF cosmid clones covering 620 kb. Three cDNAs likely corresponding to three different genes have been found in the minimally deleted region, sequenced and mapped against the contigged cosmids. cDNA clone 10k4 as well as a chimeric clone 13g3, codes for a zinc-finger domain of the RING type and shares homology to some known genes involved in tumorigenesis (RET finger protein, BRCA1) and embryogenesis (MID1). We have termed the gene corresponding to 10k4/13g3 clones LEU5. This is the first gene with homology to known TSGs which has been found in the region of B-CLL rearrangements.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Stella, Stefano; University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen; Molina, Rafael

Crystal structures of BurrH and the BurrH–DNA complex are reported. DNA editing offers new possibilities in synthetic biology and biomedicine for modulation or modification of cellular functions to organisms. However, inaccuracy in this process may lead to genome damage. To address this important problem, a strategy allowing specific gene modification has been achieved through the addition, removal or exchange of DNA sequences using customized proteins and the endogenous DNA-repair machinery. Therefore, the engineering of specific protein–DNA interactions in protein scaffolds is key to providing ‘toolkits’ for precise genome modification or regulation of gene expression. In a search for putative DNA-bindingmore » domains, BurrH, a protein that recognizes a 19 bp DNA target, was identified. Here, its apo and DNA-bound crystal structures are reported, revealing a central region containing 19 repeats of a helix–loop–helix modular domain (BurrH domain; BuD), which identifies the DNA target by a single residue-to-nucleotide code, thus facilitating its redesign for gene targeting. New DNA-binding specificities have been engineered in this template, showing that BuD-derived nucleases (BuDNs) induce high levels of gene targeting in a locus of the human haemoglobin β (HBB) gene close to mutations responsible for sickle-cell anaemia. Hence, the unique combination of high efficiency and specificity of the BuD arrays can push forward diverse genome-modification approaches for cell or organism redesign, opening new avenues for gene editing.« less
Variation in the Nucleotide Sequence of Cottontail Rabbit Papillomavirus a and b Subtypes Affects Wart Regression and Malignant Transformation and Level of Viral Replication in Domestic Rabbits

PubMed Central

Salmon, Jérôme; Nonnenmacher, Mathieu; Cazé, Sandrine; Flamant, Patricia; Croissant, Odile; Orth, Gérard; Breitburd, Françoise

2000-01-01

We previously reported the partial characterization of two cottontail rabbit papillomavirus (CRPV) subtypes with strikingly divergent E6 and E7 oncoproteins. We report now the complete nucleotide sequences of these subtypes, referred to as CRPVa4 (7,868 nucleotides) and CRPVb (7,867 nucleotides). The CRPVa4 and CRPVb genomes differed at 238 (3%) nucleotide positions, whereas CRPVa4 and the prototype CRPV differed by only 5 nucleotides. The most variable region (7% nucleotide divergence) included the long regulatory region (LRR) and the E6 and E7 genes. A mutation in the stop codon resulted in an 8-amino-acid-longer CRPVb E4 protein, and a nucleotide deletion reduced the coding capacity of the E5 gene from 101 to 25 amino acids. In domestic rabbits homozygous for a specific haplotype of the DRA and DQA genes of the major histocompatibility complex, warts induced by CRPVb DNA or a chimeric genome containing the CRPVb LRR/E6/E7 region showed an early regression, whereas warts induced by CRPVa4 or a chimeric genome containing the CRPVa4 LRR/E6/E7 region persisted and evolved into carcinomas. In contrast, most CRPVa, CRPVb, and chimeric CRPV DNA-induced warts showed no early regression in rabbits homozygous for another DRA-DQA haplotype. Little, if any, viral replication is usually observed in domestic rabbit warts. When warts induced by CRPVa and CRPVb virions and DNA were compared, the number of cells positive for viral DNA or capsid antigens was found to be greater by 1 order of magnitude for specimens induced by CRPVb. Thus, both sequence variation in the LRR/E6/E7 region and the genetic constitution of the host influence the expression of the oncogenic potential of CRPV. Furthermore, intratype variation may overcome to some extent the host restriction of CRPV replication in domestic rabbits. PMID:11044121
Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae) provide evidence for pervasive mitochondrial DNA recombination.

PubMed

Sammler, Svenja; Bleidorn, Christoph; Tiedemann, Ralph

2011-01-14

Although nowaday it is broadly accepted that mitochondrial DNA (mtDNA) may undergo recombination, the frequency of such recombination remains controversial. Its estimation is not straightforward, as recombination under homoplasmy (i.e., among identical mt genomes) is likely to be overlooked. In species with tandem duplications of large mtDNA fragments the detection of recombination can be facilitated, as it can lead to gene conversion among duplicates. Although the mechanisms for concerted evolution in mtDNA are not fully understood yet, recombination rates have been estimated from "one per speciation event" down to 850 years or even "during every replication cycle". Here we present the first complete mt genome of the avian family Bucerotidae, i.e., that of two Philippine hornbills, Aceros waldeni and Penelopides panini. The mt genomes are characterized by a tandemly duplicated region encompassing part of cytochrome b, 3 tRNAs, NADH6, and the control region. The duplicated fragments are identical to each other except for a short section in domain I and for the length of repeat motifs in domain III of the control region. Due to the heteroplasmy with regard to the number of these repeat motifs, there is some size variation in both genomes; with around 21,657 bp (A. waldeni) and 22,737 bp (P. panini), they significantly exceed the hitherto longest known avian mt genomes, that of the albatrosses. We discovered concerted evolution between the duplicated fragments within individuals. The existence of differences between individuals in coding genes as well as in the control region, which are maintained between duplicates, indicates that recombination apparently occurs frequently, i.e., in every generation. The homogenised duplicates are interspersed by a short fragment which shows no sign of recombination. We hypothesize that this region corresponds to the so-called Replication Fork Barrier (RFB), which has been described from the chicken mitochondrial genome. As this RFB is supposed to halt replication, it offers a potential mechanistic explanation for frequent recombination in mitochondrial genomes.
Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae) provide evidence for pervasive mitochondrial DNA recombination

PubMed Central

2011-01-01

Background Although nowaday it is broadly accepted that mitochondrial DNA (mtDNA) may undergo recombination, the frequency of such recombination remains controversial. Its estimation is not straightforward, as recombination under homoplasmy (i.e., among identical mt genomes) is likely to be overlooked. In species with tandem duplications of large mtDNA fragments the detection of recombination can be facilitated, as it can lead to gene conversion among duplicates. Although the mechanisms for concerted evolution in mtDNA are not fully understood yet, recombination rates have been estimated from "one per speciation event" down to 850 years or even "during every replication cycle". Results Here we present the first complete mt genome of the avian family Bucerotidae, i.e., that of two Philippine hornbills, Aceros waldeni and Penelopides panini. The mt genomes are characterized by a tandemly duplicated region encompassing part of cytochrome b, 3 tRNAs, NADH6, and the control region. The duplicated fragments are identical to each other except for a short section in domain I and for the length of repeat motifs in domain III of the control region. Due to the heteroplasmy with regard to the number of these repeat motifs, there is some size variation in both genomes; with around 21,657 bp (A. waldeni) and 22,737 bp (P. panini), they significantly exceed the hitherto longest known avian mt genomes, that of the albatrosses. We discovered concerted evolution between the duplicated fragments within individuals. The existence of differences between individuals in coding genes as well as in the control region, which are maintained between duplicates, indicates that recombination apparently occurs frequently, i.e., in every generation. Conclusions The homogenised duplicates are interspersed by a short fragment which shows no sign of recombination. We hypothesize that this region corresponds to the so-called Replication Fork Barrier (RFB), which has been described from the chicken mitochondrial genome. As this RFB is supposed to halt replication, it offers a potential mechanistic explanation for frequent recombination in mitochondrial genomes. PMID:21235758
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solovyev, V.V.; Salamov, A.A.; Lawrence, C.B.

1994-12-31

Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF codingmore » potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.« less
RADH, a gene of Saccharomyces cerevisiae encoding a putative DNA helicase involved in DNA repair. Characteristics of radH mutants and sequence of the gene.

PubMed

Aboussekhra, A; Chanet, R; Zgaga, Z; Cassier-Chauvat, C; Heude, M; Fabre, F

1989-09-25

A new type of radiation-sensitive mutant of S. cerevisiae is described. The recessive radH mutation sensitizes to the lethal effect of UV radiations haploids in the G1 but not in the G2 mitotic phase. Homozygous diploids are as sensitive as G1 haploids. The UV-induced mutagenesis is depressed, while the induction of gene conversion is increased. The mutation is believed to channel the repair of lesions engaged in the mutagenic pathway into a recombination process, successful if the events involve sister-chromatids but lethal if they involve homologous chromosomes. The sequence of the RADH gene reveals that it may code for a DNA helicase, with a Mr of 134 kDa. All the consensus domains of known DNA helicases are present. Besides these consensus regions, strong homologies with the Rep and UvrD helicases of E. coli were found. The RadH putative helicase appears to belong to the set of proteins involved in the error-prone repair mechanism, at least for UV-induced lesions, and could act in coordination with the Rev3 error-prone DNA polymerase.
Exploring the read-write genome: mobile DNA and mammalian adaptation.

PubMed

Shapiro, James A

2017-02-01

The read-write genome idea predicts that mobile DNA elements will act in evolution to generate adaptive changes in organismal DNA. This prediction was examined in the context of mammalian adaptations involving regulatory non-coding RNAs, viviparous reproduction, early embryonic and stem cell development, the nervous system, and innate immunity. The evidence shows that mobile elements have played specific and sometimes major roles in mammalian adaptive evolution by generating regulatory sites in the DNA and providing interaction motifs in non-coding RNA. Endogenous retroviruses and retrotransposons have been the predominant mobile elements in mammalian adaptive evolution, with the notable exception of bats, where DNA transposons are the major agents of RW genome inscriptions. A few examples of independent but convergent exaptation of mobile DNA elements for similar regulatory rewiring functions are noted.
Methylation Pattern of Radish (Raphanus sativus) Nuclear Ribosomal RNA Genes 1

PubMed Central

Delseny, Michel; Laroche, Monique; Penon, Paul

1984-01-01

The methylation pattern of radish Raphanus sativus nuclear rDNA has been investigated using the Hpa II, Msp I, and Hha I restriction enzymes. The presence of numerous target sites for these enzymes has been shown using cloned rDNA fragments. A large fraction of the numerous rDNA units are heavily methylated, being completely resistant to Hpa II and Hpa I. However, specific sites are constantly available in another fraction of the units and are therefore unmethylated. The use of different probes allowed us to demonstrate that hypomethylated sites are present in different regions. Major hypomethylated Hha I sites have been mapped in the 5′ portion of 25S rRNA coding sequence. Among the hypomethylated fraction, different methylation patterns coexist. It has been possible to demonstrate that methylation patterns are specific for particular units. The Hha I pattern of rDNA in tissues of different developmental stages was analyzed. Evidence for possible tissue specific differences in the methylation pattern is reported. Images Fig. 2 Fig. 3 Fig. 5 PMID:16663896
Drosophila Melanogaster Mitochondrial DNA: Gene Organization and Evolutionary Considerations

PubMed Central

Garesse, R.

1988-01-01

The sequence of a 8351-nucleotide mitochondrial DNA (mtDNA) fragment has been obtained extending the knowledge of the Drosophila melanogaster mitochondrial genome to 90% of its coding region. The sequence encodes seven polypeptides, 12 tRNAs and the 3' end of the 16S rRNA and CO III genes. The gene organization is strictly conserved with respect to the Drosophila yakuba mitochondrial genome, and different from that found in mammals and Xenopus. The high A + T content of D. melanogaster mitochondrial DNA is reflected in a reiterative codon usage, with more than 90% of the codons ending in T or A, G + C rich codons being practically absent. The average level of homology between the D. melanogaster and D. yakuba sequences is very high (roughly 94%), although insertion and deletions have been detected in protein, tRNA and large ribosomal genes. The analysis of nucleotide changes reveals a similar frequency for transitions and transversions, and reflects a strong bias against G+C on both strands. The predominant type of transition is strand specific. PMID:3130291
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

PubMed

Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

2010-05-07

Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Xenopus laevis ribosomal protein genes: isolation of recombinant cDNA clones and study of the genomic organization.

PubMed Central

Bozzoni, I; Beccari, E; Luo, Z X; Amaldi, F

1981-01-01

Poly-A+ mRNA from Xenopus laevis oocytes, partially enriched for r-protein coding capacity has been used as starting material for preparing a cDNA bank in plasmid pBR322. The clones containing sequences specific for r-proteins have been selected by translation of the complementary mRNAs. Clones for six different r-proteins have been identified and utilized as probes for studying their genomic organization. Two gene copies per haploid genome were found for r-proteins L1, L14, S19, and four-five for protein S1, S8 and L32. Moreover a population polymorphism has been observed for the genomic regions containing sequences for r-protein S1, S8 and L14. Images PMID:6112733
New tool to assemble repetitive regions using next-generation sequencing data

NASA Astrophysics Data System (ADS)

Kuśmirek, Wiktor; Nowak, Robert M.; Neumann, Łukasz

2017-08-01

The next generation sequencing techniques produce a large amount of sequencing data. Some part of the genome are composed of repetitive DNA sequences, which are very problematic for the existing genome assemblers. We propose a modification of the algorithm for a DNA assembly, which uses the relative frequency of reads to properly reconstruct repetitive sequences. The new approach was implemented and tested, as a demonstration of the capability of our software we present some results for model organisms. The new implementation, using a three-layer software architecture was selected, where the presentation layer, data processing layer, and data storage layer were kept separate. Source code as well as demo application with web interface and the additional data are available at project web-page: http://dnaasm.sourceforge.net.
Mitochondrion-to-Chloroplast DNA Transfers and Intragenomic Proliferation of Chloroplast Group II Introns in Gloeotilopsis Green Algae (Ulotrichales, Ulvophyceae).

PubMed

Turmel, Monique; Otis, Christian; Lemieux, Claude

2016-09-19

To probe organelle genome evolution in the Ulvales/Ulotrichales clade, the newly sequenced chloroplast and mitochondrial genomes of Gloeotilopsis planctonica and Gloeotilopsis sarcinoidea (Ulotrichales) were compared with those of Pseudendoclonium akinetum (Ulotrichales) and of the few other green algae previously sampled in the Ulvophyceae. At 105,236 bp, the G planctonica mitochondrial DNA (mtDNA) is the largest mitochondrial genome reported so far among chlorophytes, whereas the 221,431-bp G planctonica and 262,888-bp G sarcinoidea chloroplast DNAs (cpDNAs) are the largest chloroplast genomes analyzed among the Ulvophyceae. Gains of non-coding sequences largely account for the expansion of these genomes. Both Gloeotilopsis cpDNAs lack the inverted repeat (IR) typically found in green plants, indicating that two independent IR losses occurred in the Ulvales/Ulotrichales. Our comparison of the Pseudendoclonium and Gloeotilopsis cpDNAs offered clues regarding the mechanism of IR loss in the Ulotrichales, suggesting that internal sequences from the rDNA operon were differentially lost from the two original IR copies during this process. Our analyses also unveiled a number of genetic novelties. Short mtDNA fragments were discovered in two distinct regions of the G sarcinoidea cpDNA, providing the first evidence for intracellular inter-organelle gene migration in green algae. We identified for the first time in green algal organelles, group II introns with LAGLIDADG ORFs as well as group II introns inserted into untranslated gene regions. We discovered many group II introns occupying sites not previously documented for the chloroplast genome and demonstrated that a number of them arose by intragenomic proliferation, most likely through retrohoming. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Mitochondrion-to-Chloroplast DNA Transfers and Intragenomic Proliferation of Chloroplast Group II Introns in Gloeotilopsis Green Algae (Ulotrichales, Ulvophyceae)

PubMed Central

Turmel, Monique; Otis, Christian; Lemieux, Claude

2016-01-01

Abstract To probe organelle genome evolution in the Ulvales/Ulotrichales clade, the newly sequenced chloroplast and mitochondrial genomes of Gloeotilopsis planctonica and Gloeotilopsis sarcinoidea (Ulotrichales) were compared with those of Pseudendoclonium akinetum (Ulotrichales) and of the few other green algae previously sampled in the Ulvophyceae. At 105,236 bp, the G. planctonica mitochondrial DNA (mtDNA) is the largest mitochondrial genome reported so far among chlorophytes, whereas the 221,431-bp G. planctonica and 262,888-bp G. sarcinoidea chloroplast DNAs (cpDNAs) are the largest chloroplast genomes analyzed among the Ulvophyceae. Gains of non-coding sequences largely account for the expansion of these genomes. Both Gloeotilopsis cpDNAs lack the inverted repeat (IR) typically found in green plants, indicating that two independent IR losses occurred in the Ulvales/Ulotrichales. Our comparison of the Pseudendoclonium and Gloeotilopsis cpDNAs offered clues regarding the mechanism of IR loss in the Ulotrichales, suggesting that internal sequences from the rDNA operon were differentially lost from the two original IR copies during this process. Our analyses also unveiled a number of genetic novelties. Short mtDNA fragments were discovered in two distinct regions of the G. sarcinoidea cpDNA, providing the first evidence for intracellular inter-organelle gene migration in green algae. We identified for the first time in green algal organelles, group II introns with LAGLIDADG ORFs as well as group II introns inserted into untranslated gene regions. We discovered many group II introns occupying sites not previously documented for the chloroplast genome and demonstrated that a number of them arose by intragenomic proliferation, most likely through retrohoming. PMID:27503298
A Genome-Wide Map of Mitochondrial DNA Recombination in Yeast

PubMed Central

Fritsch, Emilie S.; Chabbert, Christophe D.; Klaus, Bernd; Steinmetz, Lars M.

2014-01-01

In eukaryotic cells, the production of cellular energy requires close interplay between nuclear and mitochondrial genomes. The mitochondrial genome is essential in that it encodes several genes involved in oxidative phosphorylation. Each cell contains several mitochondrial genome copies and mitochondrial DNA recombination is a widespread process occurring in plants, fungi, protists, and invertebrates. Saccharomyces cerevisiae has proved to be an excellent model to dissect mitochondrial biology. Several studies have focused on DNA recombination in this organelle, yet mostly relied on reporter genes or artificial systems. However, no complete mitochondrial recombination map has been released for any eukaryote so far. In the present work, we sequenced pools of diploids originating from a cross between two different S. cerevisiae strains to detect recombination events. This strategy allowed us to generate the first genome-wide map of recombination for yeast mitochondrial DNA. We demonstrated that recombination events are enriched in specific hotspots preferentially localized in non-protein-coding regions. Additionally, comparison of the recombination profiles of two different crosses showed that the genetic background affects hotspot localization and recombination rates. Finally, to gain insights into the mechanisms involved in mitochondrial recombination, we assessed the impact of individual depletion of four genes previously associated with this process. Deletion of NTG1 and MGT1 did not substantially influence the recombination landscape, alluding to the potential presence of additional regulatory factors. Our findings also revealed the loss of large mitochondrial DNA regions in the absence of MHR1, suggesting a pivotal role for Mhr1 in mitochondrial genome maintenance during mating. This study provides a comprehensive overview of mitochondrial DNA recombination in yeast and thus paves the way for future mechanistic studies of mitochondrial recombination and genome maintenance. PMID:25081569
A genome-wide map of mitochondrial DNA recombination in yeast.

PubMed

Fritsch, Emilie S; Chabbert, Christophe D; Klaus, Bernd; Steinmetz, Lars M

2014-10-01

In eukaryotic cells, the production of cellular energy requires close interplay between nuclear and mitochondrial genomes. The mitochondrial genome is essential in that it encodes several genes involved in oxidative phosphorylation. Each cell contains several mitochondrial genome copies and mitochondrial DNA recombination is a widespread process occurring in plants, fungi, protists, and invertebrates. Saccharomyces cerevisiae has proved to be an excellent model to dissect mitochondrial biology. Several studies have focused on DNA recombination in this organelle, yet mostly relied on reporter genes or artificial systems. However, no complete mitochondrial recombination map has been released for any eukaryote so far. In the present work, we sequenced pools of diploids originating from a cross between two different S. cerevisiae strains to detect recombination events. This strategy allowed us to generate the first genome-wide map of recombination for yeast mitochondrial DNA. We demonstrated that recombination events are enriched in specific hotspots preferentially localized in non-protein-coding regions. Additionally, comparison of the recombination profiles of two different crosses showed that the genetic background affects hotspot localization and recombination rates. Finally, to gain insights into the mechanisms involved in mitochondrial recombination, we assessed the impact of individual depletion of four genes previously associated with this process. Deletion of NTG1 and MGT1 did not substantially influence the recombination landscape, alluding to the potential presence of additional regulatory factors. Our findings also revealed the loss of large mitochondrial DNA regions in the absence of MHR1, suggesting a pivotal role for Mhr1 in mitochondrial genome maintenance during mating. This study provides a comprehensive overview of mitochondrial DNA recombination in yeast and thus paves the way for future mechanistic studies of mitochondrial recombination and genome maintenance. Copyright © 2014 by the Genetics Society of America.
Mutations in the NDP gene: contribution to Norrie disease, familial exudative vitreoretinopathy and retinopathy of prematurity.

PubMed

Dickinson, Joanne L; Sale, Michèle M; Passmore, Abraham; FitzGerald, Liesel M; Wheatley, Catherine M; Burdon, Kathryn P; Craig, Jamie E; Tengtrisorn, Supaporn; Carden, Susan M; Maclean, Hector; Mackey, David A

2006-01-01

To examine the contribution of mutations within the Norrie disease (NDP) gene to the clinically similar retinal diseases Norrie disease, X-linked familial exudative vitreoretinopathy (FEVR), Coat's disease and retinopathy of prematurity (ROP). A dataset comprising 13 Norrie-FEVR, one Coat's disease, 31 ROP patients and 90 ex-premature babies of <32 weeks' gestation underwent an ophthalmologic examination and were screened for mutations within the NDP gene by direct DNA sequencing, denaturing high-performance liquid chromatography or gel electrophoresis. Controls were only screened using denaturing high-performance liquid chromatography and gel electrophoresis. Confirmation of mutations identified was obtained by DNA sequencing. Evidence for two novel mutations in the NDP gene was presented: Leu103Val in one FEVR patient and His43Arg in monozygotic twin Norrie disease patients. Furthermore, a previously described 14-bp deletion located in the 5' unstranslated region of the NDP gene was detected in three cases of regressed ROP. A second heterozygotic 14-bp deletion was detected in an unaffected ex-premature girl. Only two of the 13 Norrie-FEVR index cases had the full features of Norrie disease with deafness and mental retardation. Two novel mutations within the coding region of the NDP gene were found, one associated with a severe disease phenotypes of Norrie disease and the other with FEVR. A deletion within the non-coding region was associated with only mild-regressed ROP, despite the presence of low birthweight, prematurity and exposure to oxygen. In full-term children with retinal detachment only 15% appear to have the full features of Norrie disease and this is important for counselling parents on the possible long-term outcome.
Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet

PubMed Central

Tett, Adrian; Spiers, Andrew J; Crossman, Lisa C; Ager, Duane; Ciric, Lena; Dow, J Maxwell; Fry, John C; Harris, David; Lilley, Andrew; Oliver, Anna; Parkhill, Julian; Quail, Michael A; Rainey, Paul B; Saunders, Nigel J; Seeger, Kathy; Snyder, Lori AS; Squares, Rob; Thomas, Christopher M; Turner, Sarah L; Zhang, Xue-Xian; Field, Dawn; Bailey, Mark J

2009-01-01

The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other c-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood. PMID:18043644

Some links on this page may take you to non-federal websites. Their policies may differ from this site.